dplyr transmute(): add new columns and delete existing columns

dplyr

dplyr transmute()

Master dplyr transmute() to add new columns and delete existing columns. Complete R tutorial with examples using real datasets.

Published

July 29, 2022

Introduction

The transmute() function in dplyr is a powerful data manipulation tool that creates new columns while simultaneously dropping all existing columns that aren’t explicitly mentioned. Unlike mutate() which adds new columns while keeping all existing ones, transmute() gives you a clean slate with only the columns you specify.

This function is particularly useful when you want to create derived variables and focus solely on those new calculations, or when you need to transform your dataset into a completely different structure. It’s ideal for data preprocessing steps where you want to compute summary statistics, create categorical variables from continuous ones, or perform complex transformations while keeping your output dataset clean and focused.

Getting Started

First, let’s load the required packages for our examples:

library(tidyverse)
library(palmerpenguins)

Example 1: Basic Usage

Let’s start with a simple example using the penguins dataset. We’ll create new columns while dropping all the original ones:

penguins |>
  transmute(
    species = species,
    bill_ratio = bill_length_mm / bill_depth_mm,
    flipper_body_ratio = flipper_length_mm / body_mass_g,
    size_category = case_when(
      body_mass_g < 3500 ~ "Small",
      body_mass_g < 4500 ~ "Medium",
      TRUE ~ "Large"
    )
  ) |>
  head(10)

In this example, we kept the species column and created three new columns: bill_ratio, flipper_body_ratio, and size_category. Notice how all other original columns (island, bill_length_mm, bill_depth_mm, etc.) are automatically dropped from the result.

Example 2: Practical Application

Let’s create a more complex example that demonstrates how transmute() can be used for data analysis. We’ll create a summarized view of penguin characteristics:

penguin_metrics <- penguins |>
  filter(!is.na(bill_length_mm), !is.na(body_mass_g)) |>
  transmute(
    penguin_id = paste(species, row_number(), sep = "_"),
    species_island = paste(species, island, sep = " - "),
    bill_size_index = (bill_length_mm * bill_depth_mm) / 100,
    body_condition = case_when(
      body_mass_g > 5000 ~ "Heavy",
      body_mass_g > 4000 ~ "Average",
      TRUE ~ "Light"
    ),
    flipper_efficiency = round(flipper_length_mm / body_mass_g * 1000, 2),
    overall_size_score = scale(bill_length_mm)[,1] + 
                        scale(flipper_length_mm)[,1] + 
                        scale(body_mass_g)[,1]
  )

penguin_metrics |> head(8)

Here’s another practical example showing how to create categorical variables and perform groupwise calculations:

penguins |>
  group_by(species) |>
  transmute(
    species = species,
    bill_length_percentile = percent_rank(bill_length_mm) * 100,
    above_avg_mass = body_mass_g > mean(body_mass_g, na.rm = TRUE),
    standardized_flipper = scale(flipper_length_mm)[,1],
    penguin_rank = dense_rank(desc(body_mass_g))
  ) |>
  arrange(species, desc(bill_length_percentile)) |>
  slice_head(n = 5)

This example demonstrates how transmute() works with grouped data, creating percentile rankings and standardized scores within each species group while keeping only the newly created variables.

Summary

The transmute() function is an excellent choice when you need to create new variables while maintaining a clean, focused dataset. Key takeaways include:

Selective retention: Only specified columns appear in the output, making it perfect for creating focused analytical datasets
Flexible transformations: Supports all the same operations as mutate(), including mathematical calculations, conditional logic, and string manipulations
Group-aware operations: Works seamlessly with group_by() for creating within-group calculations and rankings
Clean workflow: Eliminates the need for separate select() operations when you want to keep only transformed variables

Use `transmute()` when you want to transform your data into a new structure focused solely on derived variables, rather than augmenting existing data with additional columns.

--- title: "dplyr transmute(): add new columns and delete existing columns" description: "Master dplyr transmute() to add new columns and delete existing columns. Complete R tutorial with examples using real datasets." date: 2022-07-29 categories: ['dplyr', 'dplyr transmute()'] format: html: code-fold: false code-tools: true --- ## Introduction The `transmute()` function in dplyr is a powerful data manipulation tool that creates new columns while simultaneously dropping all existing columns that aren't explicitly mentioned. Unlike [`mutate()`](/dplyr/how-to-use-mutate-in-r.html) which adds new columns while keeping all existing ones, `transmute()` gives you a clean slate with only the columns you specify. This function is particularly useful when you want to create derived variables and focus solely on those new calculations, or when you need to transform your dataset into a completely different structure. It's ideal for data preprocessing steps where you want to compute summary statistics, create categorical variables from continuous ones, or perform complex transformations while keeping your output dataset clean and focused. ## Getting Started First, let's load the required packages for our examples: ```r library(tidyverse) library(palmerpenguins) ``` ## Example 1: Basic Usage Let's start with a simple example using the penguins dataset. We'll create new columns while dropping all the original ones: ```r penguins |> transmute( species = species, bill_ratio = bill_length_mm / bill_depth_mm, flipper_body_ratio = flipper_length_mm / body_mass_g, size_category = case_when( body_mass_g < 3500 ~ "Small", body_mass_g < 4500 ~ "Medium", TRUE ~ "Large" ) ) |> head(10) ``` In this example, we kept the `species` column and created three new columns: `bill_ratio`, `flipper_body_ratio`, and `size_category`. Notice how all other original columns (island, bill_length_mm, bill_depth_mm, etc.) are automatically dropped from the result. ## Example 2: Practical Application Let's create a more complex example that demonstrates how `transmute()` can be used for data analysis. We'll create a summarized view of penguin characteristics: ```r penguin_metrics <- penguins |> filter(!is.na(bill_length_mm), !is.na(body_mass_g)) |> transmute( penguin_id = paste(species, row_number(), sep = "_"), species_island = paste(species, island, sep = " - "), bill_size_index = (bill_length_mm * bill_depth_mm) / 100, body_condition = case_when( body_mass_g > 5000 ~ "Heavy", body_mass_g > 4000 ~ "Average", TRUE ~ "Light" ), flipper_efficiency = round(flipper_length_mm / body_mass_g * 1000, 2), overall_size_score = scale(bill_length_mm)[,1] + scale(flipper_length_mm)[,1] + scale(body_mass_g)[,1] ) penguin_metrics |> head(8) ``` Here's another practical example showing how to create categorical variables and perform groupwise calculations: ```r penguins |> group_by(species) |> transmute( species = species, bill_length_percentile = percent_rank(bill_length_mm) * 100, above_avg_mass = body_mass_g > mean(body_mass_g, na.rm = TRUE), standardized_flipper = scale(flipper_length_mm)[,1], penguin_rank = dense_rank(desc(body_mass_g)) ) |> arrange(species, desc(bill_length_percentile)) |> slice_head(n = 5) ``` This example demonstrates how `transmute()` works with grouped data, creating percentile rankings and standardized scores within each species group while keeping only the newly created variables. ## Summary The `transmute()` function is an excellent choice when you need to create new variables while maintaining a clean, focused dataset. Key takeaways include: - **Selective retention**: Only specified columns appear in the output, making it perfect for creating focused analytical datasets - **Flexible transformations**: Supports all the same operations as `mutate()`, including mathematical calculations, conditional logic, and string manipulations - **Group-aware operations**: Works seamlessly with [`group_by()`](/dplyr/how-to-use-groupby-in-r.html) for creating within-group calculations and rankings - **Clean workflow**: Eliminates the need for separate [`select()`](/dplyr/how-to-use-select-in-r.html) operations when you want to keep only transformed variables Use `transmute()` when you want to transform your data into a new structure focused solely on derived variables, rather than augmenting existing data with additional columns. --- ## Related Posts - [dplyr's mutate(): How to create new columns](/dplyr/dplyr-mutate-create-new-columns.html) - [dplyr contains(): select columns that contains a string](/dplyr/dplyr-contains-select-columns-that-contains-a-string.html) - [dplyr row_number(): Add unique row number to a dataframe](/dplyr/dplyr-row_number-add-unique-row-number-to-a-dataframe.html) - [tidyr unite(): combine multiple columns into one](/tidyr/tidyr-unite-combine-multiple-columns-into-one.html) - [tidyr's separate_delim_wider(): Split a string into columns](/tidyr/tidyrs-separate_delim_wider-split-a-string-into-columns.html)