dplyr transmute(): add new columns and delete existing columns
Introduction
The transmute() function in dplyr is a powerful data manipulation tool that creates new columns while simultaneously dropping all existing columns that aren’t explicitly mentioned. Unlike mutate() which adds new columns while keeping all existing ones, transmute() gives you a clean slate with only the columns you specify.
This function is particularly useful when you want to create derived variables and focus solely on those new calculations, or when you need to transform your dataset into a completely different structure. It’s ideal for data preprocessing steps where you want to compute summary statistics, create categorical variables from continuous ones, or perform complex transformations while keeping your output dataset clean and focused.
Getting Started
First, let’s load the required packages for our examples:
library(tidyverse)
library(palmerpenguins)Example 1: Basic Usage
Let’s start with a simple example using the penguins dataset. We’ll create new columns while dropping all the original ones:
penguins |>
transmute(
species = species,
bill_ratio = bill_length_mm / bill_depth_mm,
flipper_body_ratio = flipper_length_mm / body_mass_g,
size_category = case_when(
body_mass_g < 3500 ~ "Small",
body_mass_g < 4500 ~ "Medium",
TRUE ~ "Large"
)
) |>
head(10)In this example, we kept the species column and created three new columns: bill_ratio, flipper_body_ratio, and size_category. Notice how all other original columns (island, bill_length_mm, bill_depth_mm, etc.) are automatically dropped from the result.
Example 2: Practical Application
Let’s create a more complex example that demonstrates how transmute() can be used for data analysis. We’ll create a summarized view of penguin characteristics:
penguin_metrics <- penguins |>
filter(!is.na(bill_length_mm), !is.na(body_mass_g)) |>
transmute(
penguin_id = paste(species, row_number(), sep = "_"),
species_island = paste(species, island, sep = " - "),
bill_size_index = (bill_length_mm * bill_depth_mm) / 100,
body_condition = case_when(
body_mass_g > 5000 ~ "Heavy",
body_mass_g > 4000 ~ "Average",
TRUE ~ "Light"
),
flipper_efficiency = round(flipper_length_mm / body_mass_g * 1000, 2),
overall_size_score = scale(bill_length_mm)[,1] +
scale(flipper_length_mm)[,1] +
scale(body_mass_g)[,1]
)
penguin_metrics |> head(8)Here’s another practical example showing how to create categorical variables and perform groupwise calculations:
penguins |>
group_by(species) |>
transmute(
species = species,
bill_length_percentile = percent_rank(bill_length_mm) * 100,
above_avg_mass = body_mass_g > mean(body_mass_g, na.rm = TRUE),
standardized_flipper = scale(flipper_length_mm)[,1],
penguin_rank = dense_rank(desc(body_mass_g))
) |>
arrange(species, desc(bill_length_percentile)) |>
slice_head(n = 5)This example demonstrates how transmute() works with grouped data, creating percentile rankings and standardized scores within each species group while keeping only the newly created variables.
Summary
The transmute() function is an excellent choice when you need to create new variables while maintaining a clean, focused dataset. Key takeaways include:
- Selective retention: Only specified columns appear in the output, making it perfect for creating focused analytical datasets
- Flexible transformations: Supports all the same operations as
mutate(), including mathematical calculations, conditional logic, and string manipulations - Group-aware operations: Works seamlessly with
group_by()for creating within-group calculations and rankings - Clean workflow: Eliminates the need for separate
select()operations when you want to keep only transformed variables