dplyr’s mutate(): How to create new columns

dplyr mutate()
Complete guide to dplyr mutate() in R. Learn with practical examples and step-by-step explanations.
Published

July 21, 2022

Introduction

The mutate() function from dplyr is used to create new columns or modify existing ones in a data frame. It’s essential for data transformation tasks like calculating ratios, creating categorical variables, or applying functions to existing columns.

Getting Started

library(tidyverse)
library(palmerpenguins)

Example 1: Basic Usage

The Problem

We need to create new columns in the penguins dataset to better understand penguin characteristics. Let’s add a simple calculation and a conditional variable.

Step 1: Create a calculated column

We’ll calculate the bill ratio by dividing bill length by bill depth.

penguins_new <- penguins |>
  mutate(bill_ratio = bill_length_mm / bill_depth_mm)

head(penguins_new)

This creates a new column called bill_ratio that shows the relationship between bill dimensions.

Step 2: Add multiple columns at once

Now we’ll create both a numerical and categorical column in one mutate() call.

penguins_enhanced <- penguins |>
  mutate(
    bill_ratio = bill_length_mm / bill_depth_mm,
    body_mass_kg = body_mass_g / 1000,
    size_category = if_else(body_mass_g > 4000, "Large", "Small")
  )

We’ve added three new columns: bill ratio, body mass in kilograms, and a size category based on weight.

Step 3: Using existing new columns

We can reference newly created columns within the same mutate() statement.

penguins_complex <- penguins |>
  mutate(
    bill_ratio = bill_length_mm / bill_depth_mm,
    ratio_category = case_when(
      bill_ratio > 2.5 ~ "High ratio",
      bill_ratio < 2.0 ~ "Low ratio",
      TRUE ~ "Medium ratio"
    )
  )

The ratio_category column uses the bill_ratio column we created in the same mutate() call.

Example 2: Practical Application

The Problem

We’re analyzing car performance data and need to create fuel efficiency metrics and performance categories. We want to convert miles per gallon to kilometers per liter and classify cars by their efficiency and power.

Step 1: Convert units and calculate efficiency

We’ll convert mpg to km/L and create a power-to-weight ratio.

cars_metrics <- mtcars |>
  mutate(
    km_per_liter = mpg * 0.425144,
    power_to_weight = hp / wt,
    displacement_liters = disp / 61.024
  )

head(cars_metrics, 3)

These calculations give us more intuitive metrics for international comparisons and performance analysis.

Step 2: Create categorical variables

Now we’ll classify cars into efficiency and performance categories.

cars_classified <- mtcars |>
  mutate(
    efficiency_class = case_when(
      mpg >= 25 ~ "High efficiency",
      mpg >= 20 ~ "Medium efficiency",
      TRUE ~ "Low efficiency"
    ),
    performance_class = if_else(hp > 150, "High performance", "Standard")
  )

This creates meaningful categories that help identify different types of vehicles in our dataset.

Step 3: Combine conditions for complex classifications

Let’s create a comprehensive car type classification using multiple variables.

cars_final <- mtcars |>
  mutate(
    power_to_weight = hp / wt,
    car_type = case_when(
      mpg > 25 & hp < 100 ~ "Economy",
      hp > 200 & mpg < 15 ~ "Sports",
      wt > 4 & cyl >= 8 ~ "Luxury/Truck",
      TRUE ~ "Standard"
    )
  )

table(cars_final$car_type)

This classification system considers multiple factors to categorize each vehicle into meaningful groups.

Summary

  • mutate() creates new columns while preserving all existing data
  • You can create multiple columns in a single mutate() call for efficiency
  • New columns can reference other new columns created in the same statement
  • Use if_else() for simple binary conditions and case_when() for multiple conditions
  • Combine mutate() with mathematical operations, string functions, and conditional logic for powerful data transformations