How to use mutate() in R
Introduction
The mutate() function from dplyr is used to create new columns or modify existing columns in a data frame. It’s essential for data transformation tasks like calculating new variables, converting data types, or applying functions to existing columns.
Getting Started
library(tidyverse)
library(palmerpenguins)Example 1: Basic Usage
The Problem
We want to add a new column to the penguins dataset that converts body mass from grams to kilograms. We also need to create a column that combines species and island information.
Step 1: Create a Simple New Column
Let’s start by adding a body mass column in kilograms.
penguins_kg <- penguins |>
mutate(body_mass_kg = body_mass_g / 1000)
head(penguins_kg)This creates a new column body_mass_kg by dividing the existing body_mass_g column by 1000.
Step 2: Create Multiple Columns at Once
Now we’ll add several new columns in a single mutate() call.
penguins_enhanced <- penguins |>
mutate(
body_mass_kg = body_mass_g / 1000,
bill_ratio = bill_length_mm / bill_depth_mm,
species_island = paste(species, island, sep = "_")
)This adds three new columns simultaneously: weight in kg, bill length-to-depth ratio, and a combined species-island identifier.
Step 3: Modify Existing Columns
We can also use mutate() to transform existing columns in place.
penguins_modified <- penguins |>
mutate(
species = toupper(species),
year = as.character(year),
bill_length_mm = round(bill_length_mm, 1)
)
str(penguins_modified)This converts species names to uppercase, changes year to character type, and rounds bill length to one decimal place.
Example 2: Practical Application
The Problem
We’re analyzing penguin data for a research project and need to create standardized measurements, categorize penguins by size, and calculate condition indices. This requires multiple data transformations that build upon each other.
Step 1: Create Standardized Measurements
First, we’ll standardize the body measurements using z-scores.
penguins_std <- penguins |>
drop_na() |>
mutate(
bill_length_z = scale(bill_length_mm)[,1],
bill_depth_z = scale(bill_depth_mm)[,1],
flipper_length_z = scale(flipper_length_mm)[,1]
)This creates z-score standardized versions of three measurement variables, removing any rows with missing values first.
Step 2: Categorize Penguins by Size
Next, we’ll create size categories based on body mass quartiles.
penguins_categorized <- penguins_std |>
mutate(
size_category = case_when(
body_mass_g < quantile(body_mass_g, 0.25, na.rm = TRUE) ~ "Small",
body_mass_g < quantile(body_mass_g, 0.75, na.rm = TRUE) ~ "Medium",
TRUE ~ "Large"
)
)This uses case_when() within mutate() to create three size categories based on the 25th and 75th percentiles of body mass.
Step 3: Calculate Condition Indices
Finally, we’ll create a body condition index and efficiency ratio.
penguins_final <- penguins_categorized |>
mutate(
body_condition = body_mass_g / flipper_length_mm,
bill_efficiency = (bill_length_mm * bill_depth_mm) / body_mass_g,
is_large_penguin = body_mass_g > mean(body_mass_g, na.rm = TRUE)
)
summary(penguins_final)This creates a condition index (mass per unit flipper length), bill efficiency metric, and a logical column identifying above-average sized penguins.
Step 4: Combine with Grouping
We can use mutate() with group_by() to create species-specific calculations.
penguins_grouped <- penguins_final |>
group_by(species) |>
mutate(
mass_rank_in_species = rank(body_mass_g),
mass_deviation = body_mass_g - mean(body_mass_g, na.rm = TRUE)
) |>
ungroup()This ranks each penguin’s mass within their species and calculates how much each penguin’s mass deviates from their species average.
Summary
- Use
mutate()to create new columns or modify existing ones without changing the number of rows - Multiple columns can be created simultaneously by separating them with commas
- New columns can reference previously created columns within the same
mutate()call - Combine
mutate()withcase_when()for conditional column creation When used with
group_by(), calculations are performed within each group