dplyr’s mutate(): How to create new columns
Introduction
The mutate() function from dplyr is used to create new columns or modify existing ones in a data frame. It’s essential for data transformation tasks like calculating ratios, creating categorical variables, or applying functions to existing columns.
Getting Started
library(tidyverse)
library(palmerpenguins)Example 1: Basic Usage
The Problem
We need to create new columns in the penguins dataset to better understand penguin characteristics. Let’s add a simple calculation and a conditional variable.
Step 1: Create a calculated column
We’ll calculate the bill ratio by dividing bill length by bill depth.
penguins_new <- penguins |>
mutate(bill_ratio = bill_length_mm / bill_depth_mm)
head(penguins_new)This creates a new column called bill_ratio that shows the relationship between bill dimensions.
Step 2: Add multiple columns at once
Now we’ll create both a numerical and categorical column in one mutate() call.
penguins_enhanced <- penguins |>
mutate(
bill_ratio = bill_length_mm / bill_depth_mm,
body_mass_kg = body_mass_g / 1000,
size_category = if_else(body_mass_g > 4000, "Large", "Small")
)We’ve added three new columns: bill ratio, body mass in kilograms, and a size category based on weight.
Step 3: Using existing new columns
We can reference newly created columns within the same mutate() statement.
penguins_complex <- penguins |>
mutate(
bill_ratio = bill_length_mm / bill_depth_mm,
ratio_category = case_when(
bill_ratio > 2.5 ~ "High ratio",
bill_ratio < 2.0 ~ "Low ratio",
TRUE ~ "Medium ratio"
)
)The ratio_category column uses the bill_ratio column we created in the same mutate() call.
Example 2: Practical Application
The Problem
We’re analyzing car performance data and need to create fuel efficiency metrics and performance categories. We want to convert miles per gallon to kilometers per liter and classify cars by their efficiency and power.
Step 1: Convert units and calculate efficiency
We’ll convert mpg to km/L and create a power-to-weight ratio.
cars_metrics <- mtcars |>
mutate(
km_per_liter = mpg * 0.425144,
power_to_weight = hp / wt,
displacement_liters = disp / 61.024
)
head(cars_metrics, 3)These calculations give us more intuitive metrics for international comparisons and performance analysis.
Step 2: Create categorical variables
Now we’ll classify cars into efficiency and performance categories.
cars_classified <- mtcars |>
mutate(
efficiency_class = case_when(
mpg >= 25 ~ "High efficiency",
mpg >= 20 ~ "Medium efficiency",
TRUE ~ "Low efficiency"
),
performance_class = if_else(hp > 150, "High performance", "Standard")
)This creates meaningful categories that help identify different types of vehicles in our dataset.
Step 3: Combine conditions for complex classifications
Let’s create a comprehensive car type classification using multiple variables.
cars_final <- mtcars |>
mutate(
power_to_weight = hp / wt,
car_type = case_when(
mpg > 25 & hp < 100 ~ "Economy",
hp > 200 & mpg < 15 ~ "Sports",
wt > 4 & cyl >= 8 ~ "Luxury/Truck",
TRUE ~ "Standard"
)
)
table(cars_final$car_type)This classification system considers multiple factors to categorize each vehicle into meaningful groups.
Summary
mutate()creates new columns while preserving all existing data- You can create multiple columns in a single mutate() call for efficiency
- New columns can reference other new columns created in the same statement
- Use
if_else()for simple binary conditions andcase_when()for multiple conditions Combine mutate() with mathematical operations, string functions, and conditional logic for powerful data transformations