How to use unite() in R

tidyr
unite()
Learn how to use unite() in R with practical examples. Step-by-step guide with code you can copy and run immediately.
Published

February 21, 2026

Introduction

The tidyr::unite() function combines multiple columns into a single column by concatenating their values with a specified separator. This function is essential for creating composite identifiers, formatting data for analysis, or preparing data for visualization where you need information from multiple columns displayed together. It’s particularly useful when you need to create unique identifiers, combine categorical variables, or format data for reporting purposes.

Getting Started

library(tidyverse)
library(palmerpenguins)

Example 1: Basic Usage

Let’s start with a simple example using the Palmer penguins dataset to create a combined species-island identifier:

penguins |>
  unite(col = "species_island", 
        species, island, 
        sep = "_") |>
  select(species_island, bill_length_mm, bill_depth_mm)

In this example, unite() takes the species and island columns and combines them into a new column called species_island using an underscore as the separator. The original columns are removed by default, leaving us with the new combined column along with the other variables we selected.

You can also keep the original columns by setting remove = FALSE:

penguins |>
  unite(col = "species_island", 
        species, island, 
        sep = "_",
        remove = FALSE) |>
  select(species, island, species_island, body_mass_g)

Example 2: Practical Application

Here’s a more practical example where we create a comprehensive penguin identifier that includes multiple characteristics, then use it for grouping and analysis:

penguins |>
  drop_na() |>
  unite(col = "penguin_id", 
        species, island, sex, year,
        sep = "-") |>
  group_by(penguin_id) |>
  summarise(
    count = n(),
    avg_bill_length = mean(bill_length_mm),
    avg_body_mass = mean(body_mass_g),
    .groups = "drop"
  ) |>
  filter(count >= 5) |>
  arrange(desc(avg_body_mass))

This example demonstrates how unite() works seamlessly with other tidyverse functions. We first remove missing values, then create a comprehensive identifier combining species, island, sex, and year. This identifier helps us group penguins by these characteristics to calculate meaningful statistics.

Another practical application is formatting data for labels or reports:

penguins |>
  drop_na(bill_length_mm, bill_depth_mm) |>
  unite(col = "bill_dimensions", 
        bill_length_mm, bill_depth_mm,
        sep = " x ",
        remove = FALSE) |>
  unite(col = "penguin_label",
        species, bill_dimensions,
        sep = ": ") |>
  select(penguin_label, body_mass_g, flipper_length_mm) |>
  slice_head(n = 10)

Here we create descriptive labels by first combining bill dimensions with ” x ” as a separator, then combining the species name with these dimensions using a colon separator. This creates human-readable labels perfect for plots or reports.

You can also handle missing values explicitly:

penguins |>
  unite(col = "location_year",
        island, year,
        sep = "_",
        na.rm = TRUE) |>
  count(location_year, species) |>
  arrange(desc(n))

Summary

  • unite() is perfect for creating composite identifiers, formatted labels, or combining categorical variables for analysis and visualization
  • The function offers flexibility through parameters like sep for custom separators, remove to control whether original columns are kept, and na.rm to handle missing values appropriately
  • It integrates seamlessly with other tidyverse functions, making it ideal for data preparation workflows where you need to reshape data before analysis or create meaningful grouping variables