How to use mutate() in R

dplyr
dplyr mutate()
Learn how to use mutate() in R with practical examples. Step-by-step guide with code you can copy and run immediately.
Published

February 21, 2026

Introduction

The mutate() function from dplyr is used to create new columns or modify existing columns in a data frame. It’s essential for data transformation tasks like calculating new variables, converting data types, or applying functions to existing columns.

Getting Started

library(tidyverse)
library(palmerpenguins)

Example 1: Basic Usage

The Problem

We want to add a new column to the penguins dataset that converts body mass from grams to kilograms. We also need to create a column that combines species and island information.

Step 1: Create a Simple New Column

Let’s start by adding a body mass column in kilograms.

penguins_kg <- penguins |>
  mutate(body_mass_kg = body_mass_g / 1000)

head(penguins_kg)

This creates a new column body_mass_kg by dividing the existing body_mass_g column by 1000.

Step 2: Create Multiple Columns at Once

Now we’ll add several new columns in a single mutate() call.

penguins_enhanced <- penguins |>
  mutate(
    body_mass_kg = body_mass_g / 1000,
    bill_ratio = bill_length_mm / bill_depth_mm,
    species_island = paste(species, island, sep = "_")
  )

This adds three new columns simultaneously: weight in kg, bill length-to-depth ratio, and a combined species-island identifier.

Step 3: Modify Existing Columns

We can also use mutate() to transform existing columns in place.

penguins_modified <- penguins |>
  mutate(
    species = toupper(species),
    year = as.character(year),
    bill_length_mm = round(bill_length_mm, 1)
  )

str(penguins_modified)

This converts species names to uppercase, changes year to character type, and rounds bill length to one decimal place.

Example 2: Practical Application

The Problem

We’re analyzing penguin data for a research project and need to create standardized measurements, categorize penguins by size, and calculate condition indices. This requires multiple data transformations that build upon each other.

Step 1: Create Standardized Measurements

First, we’ll standardize the body measurements using z-scores.

penguins_std <- penguins |>
  drop_na() |>
  mutate(
    bill_length_z = scale(bill_length_mm)[,1],
    bill_depth_z = scale(bill_depth_mm)[,1],
    flipper_length_z = scale(flipper_length_mm)[,1]
  )

This creates z-score standardized versions of three measurement variables, removing any rows with missing values first.

Step 2: Categorize Penguins by Size

Next, we’ll create size categories based on body mass quartiles.

penguins_categorized <- penguins_std |>
  mutate(
    size_category = case_when(
      body_mass_g < quantile(body_mass_g, 0.25, na.rm = TRUE) ~ "Small",
      body_mass_g < quantile(body_mass_g, 0.75, na.rm = TRUE) ~ "Medium",
      TRUE ~ "Large"
    )
  )

This uses case_when() within mutate() to create three size categories based on the 25th and 75th percentiles of body mass.

Step 3: Calculate Condition Indices

Finally, we’ll create a body condition index and efficiency ratio.

penguins_final <- penguins_categorized |>
  mutate(
    body_condition = body_mass_g / flipper_length_mm,
    bill_efficiency = (bill_length_mm * bill_depth_mm) / body_mass_g,
    is_large_penguin = body_mass_g > mean(body_mass_g, na.rm = TRUE)
  )

summary(penguins_final)

This creates a condition index (mass per unit flipper length), bill efficiency metric, and a logical column identifying above-average sized penguins.

Step 4: Combine with Grouping

We can use mutate() with group_by() to create species-specific calculations.

penguins_grouped <- penguins_final |>
  group_by(species) |>
  mutate(
    mass_rank_in_species = rank(body_mass_g),
    mass_deviation = body_mass_g - mean(body_mass_g, na.rm = TRUE)
  ) |>
  ungroup()

This ranks each penguin’s mass within their species and calculates how much each penguin’s mass deviates from their species average.

Summary

  • Use mutate() to create new columns or modify existing ones without changing the number of rows
  • Multiple columns can be created simultaneously by separating them with commas
  • New columns can reference previously created columns within the same mutate() call
  • Combine mutate() with case_when() for conditional column creation
  • When used with group_by(), calculations are performed within each group