How to use mutate() in R

dplyr

dplyr mutate()

Learn how to use mutate() in R with practical examples. Step-by-step guide with code you can copy and run immediately.

Published

February 21, 2026

Introduction

The mutate() function from dplyr is used to create new columns or modify existing columns in a data frame. It’s essential for data transformation tasks like calculating new variables, converting data types, or applying functions to existing columns.

Getting Started

library(tidyverse)
library(palmerpenguins)

Example 1: Basic Usage

The Problem

We want to add a new column to the penguins dataset that converts body mass from grams to kilograms. We also need to create a column that combines species and island information.

Step 1: Create a Simple New Column

Let’s start by adding a body mass column in kilograms.

penguins_kg <- penguins |>
  mutate(body_mass_kg = body_mass_g / 1000)

head(penguins_kg)

This creates a new column body_mass_kg by dividing the existing body_mass_g column by 1000.

Step 2: Create Multiple Columns at Once

Now we’ll add several new columns in a single mutate() call.

penguins_enhanced <- penguins |>
  mutate(
    body_mass_kg = body_mass_g / 1000,
    bill_ratio = bill_length_mm / bill_depth_mm,
    species_island = paste(species, island, sep = "_")
  )

This adds three new columns simultaneously: weight in kg, bill length-to-depth ratio, and a combined species-island identifier.

Step 3: Modify Existing Columns

We can also use mutate() to transform existing columns in place.

penguins_modified <- penguins |>
  mutate(
    species = toupper(species),
    year = as.character(year),
    bill_length_mm = round(bill_length_mm, 1)
  )

str(penguins_modified)

This converts species names to uppercase, changes year to character type, and rounds bill length to one decimal place.

Example 2: Practical Application

The Problem

We’re analyzing penguin data for a research project and need to create standardized measurements, categorize penguins by size, and calculate condition indices. This requires multiple data transformations that build upon each other.

Step 1: Create Standardized Measurements

First, we’ll standardize the body measurements using z-scores.

penguins_std <- penguins |>
  drop_na() |>
  mutate(
    bill_length_z = scale(bill_length_mm)[,1],
    bill_depth_z = scale(bill_depth_mm)[,1],
    flipper_length_z = scale(flipper_length_mm)[,1]
  )

This creates z-score standardized versions of three measurement variables, removing any rows with missing values first.

Step 2: Categorize Penguins by Size

Next, we’ll create size categories based on body mass quartiles.

penguins_categorized <- penguins_std |>
  mutate(
    size_category = case_when(
      body_mass_g < quantile(body_mass_g, 0.25, na.rm = TRUE) ~ "Small",
      body_mass_g < quantile(body_mass_g, 0.75, na.rm = TRUE) ~ "Medium",
      TRUE ~ "Large"
    )
  )

This uses case_when() within mutate() to create three size categories based on the 25th and 75th percentiles of body mass.

Step 3: Calculate Condition Indices

Finally, we’ll create a body condition index and efficiency ratio.

penguins_final <- penguins_categorized |>
  mutate(
    body_condition = body_mass_g / flipper_length_mm,
    bill_efficiency = (bill_length_mm * bill_depth_mm) / body_mass_g,
    is_large_penguin = body_mass_g > mean(body_mass_g, na.rm = TRUE)
  )

summary(penguins_final)

This creates a condition index (mass per unit flipper length), bill efficiency metric, and a logical column identifying above-average sized penguins.

Step 4: Combine with Grouping

We can use mutate() with group_by() to create species-specific calculations.

penguins_grouped <- penguins_final |>
  group_by(species) |>
  mutate(
    mass_rank_in_species = rank(body_mass_g),
    mass_deviation = body_mass_g - mean(body_mass_g, na.rm = TRUE)
  ) |>
  ungroup()

This ranks each penguin’s mass within their species and calculates how much each penguin’s mass deviates from their species average.

Summary

Use mutate() to create new columns or modify existing ones without changing the number of rows
Multiple columns can be created simultaneously by separating them with commas
New columns can reference previously created columns within the same mutate() call
Combine mutate() with case_when() for conditional column creation
When used with group_by(), calculations are performed within each group

--- title: "How to use mutate() in R" description: "Learn how to use mutate() in R with practical examples. Step-by-step guide with code you can copy and run immediately." date: 2026-02-21 categories: ['dplyr', 'dplyr mutate()'] format: html: code-fold: false code-tools: true --- ## Introduction The `mutate()` function from dplyr is used to create new columns or modify existing columns in a data frame. It's essential for data transformation tasks like calculating new variables, converting data types, or applying functions to existing columns. ## Getting Started ```r library(tidyverse) library(palmerpenguins) ``` ## Example 1: Basic Usage ### The Problem We want to add a new column to the penguins dataset that converts body mass from grams to kilograms. We also need to create a column that combines species and island information. ### Step 1: Create a Simple New Column Let's start by adding a body mass column in kilograms. ```r penguins_kg <- penguins |> mutate(body_mass_kg = body_mass_g / 1000) head(penguins_kg) ``` This creates a new column `body_mass_kg` by dividing the existing `body_mass_g` column by 1000. ### Step 2: Create Multiple Columns at Once Now we'll add several new columns in a single `mutate()` call. ```r penguins_enhanced <- penguins |> mutate( body_mass_kg = body_mass_g / 1000, bill_ratio = bill_length_mm / bill_depth_mm, species_island = paste(species, island, sep = "_") ) ``` This adds three new columns simultaneously: weight in kg, bill length-to-depth ratio, and a combined species-island identifier. ### Step 3: Modify Existing Columns We can also use `mutate()` to transform existing columns in place. ```r penguins_modified <- penguins |> mutate( species = toupper(species), year = as.character(year), bill_length_mm = round(bill_length_mm, 1) ) str(penguins_modified) ``` This converts species names to uppercase, changes year to character type, and rounds bill length to one decimal place. ## Example 2: Practical Application ### The Problem We're analyzing penguin data for a research project and need to create standardized measurements, categorize penguins by size, and calculate condition indices. This requires multiple data transformations that build upon each other. ### Step 1: Create Standardized Measurements First, we'll standardize the body measurements using z-scores. ```r penguins_std <- penguins |> drop_na() |> mutate( bill_length_z = scale(bill_length_mm)[,1], bill_depth_z = scale(bill_depth_mm)[,1], flipper_length_z = scale(flipper_length_mm)[,1] ) ``` This creates z-score standardized versions of three measurement variables, removing any rows with missing values first. ### Step 2: Categorize Penguins by Size Next, we'll create size categories based on body mass quartiles. ```r penguins_categorized <- penguins_std |> mutate( size_category = case_when( body_mass_g < quantile(body_mass_g, 0.25, na.rm = TRUE) ~ "Small", body_mass_g < quantile(body_mass_g, 0.75, na.rm = TRUE) ~ "Medium", TRUE ~ "Large" ) ) ``` This uses [`case_when()`](/dplyr/dplyr-case_when-to-create-new-variable-using-multiple-conditions.html) within `mutate()` to create three size categories based on the 25th and 75th percentiles of body mass. ### Step 3: Calculate Condition Indices Finally, we'll create a body condition index and efficiency ratio. ```r penguins_final <- penguins_categorized |> mutate( body_condition = body_mass_g / flipper_length_mm, bill_efficiency = (bill_length_mm * bill_depth_mm) / body_mass_g, is_large_penguin = body_mass_g > mean(body_mass_g, na.rm = TRUE) ) summary(penguins_final) ``` This creates a condition index (mass per unit flipper length), bill efficiency metric, and a logical column identifying above-average sized penguins. ### Step 4: Combine with Grouping We can use `mutate()` with [`group_by()`](/dplyr/how-to-use-groupby-in-r.html) to create species-specific calculations. ```r penguins_grouped <- penguins_final |> group_by(species) |> mutate( mass_rank_in_species = rank(body_mass_g), mass_deviation = body_mass_g - mean(body_mass_g, na.rm = TRUE) ) |> ungroup() ``` This ranks each penguin's mass within their species and calculates how much each penguin's mass deviates from their species average. ## Summary - Use `mutate()` to create new columns or modify existing ones without changing the number of rows - Multiple columns can be created simultaneously by separating them with commas - New columns can reference previously created columns within the same `mutate()` call - Combine `mutate()` with `case_when()` for conditional column creation - When used with `group_by()`, calculations are performed within each group --- ## Related Posts - [How to use select() in R](/dplyr/how-to-use-select-in-r.html) - [dplyr's mutate(): How to create new columns](/dplyr/dplyr-mutate-create-new-columns.html) - [How to use pull() in R](/dplyr/how-to-use-pull-in-r.html) - [How to use separate() in R](/tidyr/how-to-use-separate-in-r.html) - [How to use separate_wider_delim() in R](/tidyr/how-to-use-separatewiderdelim-in-r.html)

Introduction

Getting Started

Example 1: Basic Usage

The Problem

Step 1: Create a Simple New Column

Step 2: Create Multiple Columns at Once

Step 3: Modify Existing Columns

Example 2: Practical Application

The Problem

Step 1: Create Standardized Measurements

Step 2: Categorize Penguins by Size

Step 3: Calculate Condition Indices

Step 4: Combine with Grouping

Summary

When used with group_by(), calculations are performed within each group

Related Posts

When used with `group_by()`, calculations are performed within each group