dplyr transmute(): add new columns and delete existing columns

dplyr
dplyr transmute()
Master dplyr transmute() to add new columns and delete existing columns. Complete R tutorial with examples using real datasets.
Published

July 29, 2022

Introduction

The transmute() function in dplyr is a powerful data manipulation tool that creates new columns while simultaneously dropping all existing columns that aren’t explicitly mentioned. Unlike mutate() which adds new columns while keeping all existing ones, transmute() gives you a clean slate with only the columns you specify.

This function is particularly useful when you want to create derived variables and focus solely on those new calculations, or when you need to transform your dataset into a completely different structure. It’s ideal for data preprocessing steps where you want to compute summary statistics, create categorical variables from continuous ones, or perform complex transformations while keeping your output dataset clean and focused.

Getting Started

First, let’s load the required packages for our examples:

library(tidyverse)
library(palmerpenguins)

Example 1: Basic Usage

Let’s start with a simple example using the penguins dataset. We’ll create new columns while dropping all the original ones:

penguins |>
  transmute(
    species = species,
    bill_ratio = bill_length_mm / bill_depth_mm,
    flipper_body_ratio = flipper_length_mm / body_mass_g,
    size_category = case_when(
      body_mass_g < 3500 ~ "Small",
      body_mass_g < 4500 ~ "Medium",
      TRUE ~ "Large"
    )
  ) |>
  head(10)

In this example, we kept the species column and created three new columns: bill_ratio, flipper_body_ratio, and size_category. Notice how all other original columns (island, bill_length_mm, bill_depth_mm, etc.) are automatically dropped from the result.

Example 2: Practical Application

Let’s create a more complex example that demonstrates how transmute() can be used for data analysis. We’ll create a summarized view of penguin characteristics:

penguin_metrics <- penguins |>
  filter(!is.na(bill_length_mm), !is.na(body_mass_g)) |>
  transmute(
    penguin_id = paste(species, row_number(), sep = "_"),
    species_island = paste(species, island, sep = " - "),
    bill_size_index = (bill_length_mm * bill_depth_mm) / 100,
    body_condition = case_when(
      body_mass_g > 5000 ~ "Heavy",
      body_mass_g > 4000 ~ "Average",
      TRUE ~ "Light"
    ),
    flipper_efficiency = round(flipper_length_mm / body_mass_g * 1000, 2),
    overall_size_score = scale(bill_length_mm)[,1] + 
                        scale(flipper_length_mm)[,1] + 
                        scale(body_mass_g)[,1]
  )

penguin_metrics |> head(8)

Here’s another practical example showing how to create categorical variables and perform groupwise calculations:

penguins |>
  group_by(species) |>
  transmute(
    species = species,
    bill_length_percentile = percent_rank(bill_length_mm) * 100,
    above_avg_mass = body_mass_g > mean(body_mass_g, na.rm = TRUE),
    standardized_flipper = scale(flipper_length_mm)[,1],
    penguin_rank = dense_rank(desc(body_mass_g))
  ) |>
  arrange(species, desc(bill_length_percentile)) |>
  slice_head(n = 5)

This example demonstrates how transmute() works with grouped data, creating percentile rankings and standardized scores within each species group while keeping only the newly created variables.

Summary

The transmute() function is an excellent choice when you need to create new variables while maintaining a clean, focused dataset. Key takeaways include:

  • Selective retention: Only specified columns appear in the output, making it perfect for creating focused analytical datasets
  • Flexible transformations: Supports all the same operations as mutate(), including mathematical calculations, conditional logic, and string manipulations
  • Group-aware operations: Works seamlessly with group_by() for creating within-group calculations and rankings
  • Clean workflow: Eliminates the need for separate select() operations when you want to keep only transformed variables

Use transmute() when you want to transform your data into a new structure focused solely on derived variables, rather than augmenting existing data with additional columns.