How to use separate() in R

tidyr
separate()
Learn how to use separate() in R with practical examples. Step-by-step guide with code you can copy and run immediately.
Published

February 21, 2026

Introduction

The separate() function from the tidyr package splits a single column containing multiple values into several columns. This is especially useful when working with data where multiple pieces of information are stored in one column, separated by delimiters like commas, underscores, or spaces.

Getting Started

library(tidyverse)
library(palmerpenguins)

Example 1: Basic Usage

The Problem

Imagine you have a dataset where species and island information are combined in a single column. You need to split this information into separate columns for better analysis and data manipulation.

Step 1: Create sample data with combined information

Let’s create a simple dataset that mimics this common data problem.

# Create sample data with combined values
sample_data <- tibble(
  id = 1:4,
  species_island = c("Adelie_Torgersen", "Gentoo_Biscoe", 
                     "Chinstrap_Dream", "Adelie_Biscoe")
)
sample_data

This creates a dataset where species and island names are combined with underscores.

Step 2: Apply separate() to split the column

Now we’ll use separate() to split the combined column into two distinct columns.

# Separate the combined column
separated_data <- sample_data |>
  separate(species_island, 
           into = c("species", "island"), 
           sep = "_")
separated_data

The function successfully splits the species_island column into species and island columns using the underscore as a separator.

Step 3: Verify the transformation

Let’s examine the structure of our transformed data.

# Check the column names and data types
glimpse(separated_data)

We now have four columns: id, species, and island, with each piece of information properly separated.

Example 2: Practical Application

The Problem

You’re working with penguin measurement data where the researcher recorded species, sex, and year information in a single field. You need to separate this information to perform grouped analyses and create meaningful visualizations.

Step 1: Create realistic penguin data

Let’s simulate a dataset that represents this common real-world scenario.

# Create complex penguin data
penguin_data <- tibble(
  measurement_id = 1:6,
  species_sex_year = c("Adelie-male-2007", "Gentoo-female-2008",
                       "Chinstrap-male-2009", "Adelie-female-2007",
                       "Gentoo-male-2008", "Chinstrap-female-2009"),
  bill_length = c(39.1, 46.1, 48.7, 36.7, 47.2, 46.5)
)
penguin_data

This creates a dataset with measurement information stored in a single hyphen-separated column.

Step 2: Separate the complex column

We’ll split the combined information into three separate columns for easier analysis.

# Separate into three columns
clean_penguin_data <- penguin_data |>
  separate(species_sex_year, 
           into = c("species", "sex", "year"), 
           sep = "-")
clean_penguin_data

The data is now properly structured with individual columns for species, sex, and year information.

Step 3: Convert data types and analyze

Now we can convert the year to numeric and perform grouped analysis.

# Convert year to numeric and calculate summary
final_data <- clean_penguin_data |>
  mutate(year = as.numeric(year)) |>
  group_by(species, sex) |>
  summarise(avg_bill_length = mean(bill_length), .groups = "drop")
final_data

With properly separated columns, we can easily calculate average bill length by species and sex combinations.

Step 4: Handle edge cases with convert parameter

The separate() function can automatically convert column types when specified.

# Use convert = TRUE to automatically handle data types
auto_converted <- penguin_data |>
  separate(species_sex_year, 
           into = c("species", "sex", "year"), 
           sep = "-", 
           convert = TRUE)
glimpse(auto_converted)

The convert = TRUE parameter automatically converts the year column to numeric type.

Summary

  • Use separate() to split single columns containing multiple values into separate columns
  • Specify column names with the into parameter and delimiter with sep parameter
  • The convert = TRUE parameter automatically converts data types when appropriate
  • separate() works with any delimiter: underscores, hyphens, commas, or custom patterns
  • This function is essential for cleaning messy datasets and preparing data for analysis