How to use separate() in R

tidyr

separate()

Learn how to use separate() in R with practical examples. Step-by-step guide with code you can copy and run immediately.

Published

February 21, 2026

Introduction

The separate() function from the tidyr package splits a single column containing multiple values into several columns. This is especially useful when working with data where multiple pieces of information are stored in one column, separated by delimiters like commas, underscores, or spaces.

Getting Started

library(tidyverse)
library(palmerpenguins)

Example 1: Basic Usage

The Problem

Imagine you have a dataset where species and island information are combined in a single column. You need to split this information into separate columns for better analysis and data manipulation.

Step 1: Create sample data with combined information

Let’s create a simple dataset that mimics this common data problem.

# Create sample data with combined values
sample_data <- tibble(
  id = 1:4,
  species_island = c("Adelie_Torgersen", "Gentoo_Biscoe", 
                     "Chinstrap_Dream", "Adelie_Biscoe")
)
sample_data

This creates a dataset where species and island names are combined with underscores.

Step 2: Apply separate() to split the column

Now we’ll use separate() to split the combined column into two distinct columns.

# Separate the combined column
separated_data <- sample_data |>
  separate(species_island, 
           into = c("species", "island"), 
           sep = "_")
separated_data

The function successfully splits the species_island column into species and island columns using the underscore as a separator.

Step 3: Verify the transformation

Let’s examine the structure of our transformed data.

# Check the column names and data types
glimpse(separated_data)

We now have four columns: id, species, and island, with each piece of information properly separated.

Example 2: Practical Application

The Problem

You’re working with penguin measurement data where the researcher recorded species, sex, and year information in a single field. You need to separate this information to perform grouped analyses and create meaningful visualizations.

Step 1: Create realistic penguin data

Let’s simulate a dataset that represents this common real-world scenario.

# Create complex penguin data
penguin_data <- tibble(
  measurement_id = 1:6,
  species_sex_year = c("Adelie-male-2007", "Gentoo-female-2008",
                       "Chinstrap-male-2009", "Adelie-female-2007",
                       "Gentoo-male-2008", "Chinstrap-female-2009"),
  bill_length = c(39.1, 46.1, 48.7, 36.7, 47.2, 46.5)
)
penguin_data

This creates a dataset with measurement information stored in a single hyphen-separated column.

Step 2: Separate the complex column

We’ll split the combined information into three separate columns for easier analysis.

# Separate into three columns
clean_penguin_data <- penguin_data |>
  separate(species_sex_year, 
           into = c("species", "sex", "year"), 
           sep = "-")
clean_penguin_data

The data is now properly structured with individual columns for species, sex, and year information.

Step 3: Convert data types and analyze

Now we can convert the year to numeric and perform grouped analysis.

# Convert year to numeric and calculate summary
final_data <- clean_penguin_data |>
  mutate(year = as.numeric(year)) |>
  group_by(species, sex) |>
  summarise(avg_bill_length = mean(bill_length), .groups = "drop")
final_data

With properly separated columns, we can easily calculate average bill length by species and sex combinations.

Step 4: Handle edge cases with convert parameter

The separate() function can automatically convert column types when specified.

# Use convert = TRUE to automatically handle data types
auto_converted <- penguin_data |>
  separate(species_sex_year, 
           into = c("species", "sex", "year"), 
           sep = "-", 
           convert = TRUE)
glimpse(auto_converted)

The convert = TRUE parameter automatically converts the year column to numeric type.

Summary

Use separate() to split single columns containing multiple values into separate columns
Specify column names with the into parameter and delimiter with sep parameter
The convert = TRUE parameter automatically converts data types when appropriate
separate() works with any delimiter: underscores, hyphens, commas, or custom patterns
This function is essential for cleaning messy datasets and preparing data for analysis

--- title: "How to use separate() in R" description: "Learn how to use separate() in R with practical examples. Step-by-step guide with code you can copy and run immediately." date: 2026-02-21 categories: ['tidyr', 'separate()'] format: html: code-fold: false code-tools: true --- ## Introduction The `separate()` function from the tidyr package splits a single column containing multiple values into several columns. This is especially useful when working with data where multiple pieces of information are stored in one column, separated by delimiters like commas, underscores, or spaces. ## Getting Started ```r library(tidyverse) library(palmerpenguins) ``` ## Example 1: Basic Usage ### The Problem Imagine you have a dataset where species and island information are combined in a single column. You need to split this information into separate columns for better analysis and data manipulation. ### Step 1: Create sample data with combined information Let's create a simple dataset that mimics this common data problem. ```r # Create sample data with combined values sample_data <- tibble( id = 1:4, species_island = c("Adelie_Torgersen", "Gentoo_Biscoe", "Chinstrap_Dream", "Adelie_Biscoe") ) sample_data ``` This creates a dataset where species and island names are combined with underscores. ### Step 2: Apply separate() to split the column Now we'll use `separate()` to split the combined column into two distinct columns. ```r # Separate the combined column separated_data <- sample_data |> separate(species_island, into = c("species", "island"), sep = "_") separated_data ``` The function successfully splits the `species_island` column into `species` and `island` columns using the underscore as a separator. ### Step 3: Verify the transformation Let's examine the structure of our transformed data. ```r # Check the column names and data types glimpse(separated_data) ``` We now have four columns: `id`, `species`, and `island`, with each piece of information properly separated. ## Example 2: Practical Application ### The Problem You're working with penguin measurement data where the researcher recorded species, sex, and year information in a single field. You need to separate this information to perform grouped analyses and create meaningful visualizations. ### Step 1: Create realistic penguin data Let's simulate a dataset that represents this common real-world scenario. ```r # Create complex penguin data penguin_data <- tibble( measurement_id = 1:6, species_sex_year = c("Adelie-male-2007", "Gentoo-female-2008", "Chinstrap-male-2009", "Adelie-female-2007", "Gentoo-male-2008", "Chinstrap-female-2009"), bill_length = c(39.1, 46.1, 48.7, 36.7, 47.2, 46.5) ) penguin_data ``` This creates a dataset with measurement information stored in a single hyphen-separated column. ### Step 2: Separate the complex column We'll split the combined information into three separate columns for easier analysis. ```r # Separate into three columns clean_penguin_data <- penguin_data |> separate(species_sex_year, into = c("species", "sex", "year"), sep = "-") clean_penguin_data ``` The data is now properly structured with individual columns for species, sex, and year information. ### Step 3: Convert data types and analyze Now we can convert the year to numeric and perform grouped analysis. ```r # Convert year to numeric and calculate summary final_data <- clean_penguin_data |> mutate(year = as.numeric(year)) |> group_by(species, sex) |> summarise(avg_bill_length = mean(bill_length), .groups = "drop") final_data ``` With properly separated columns, we can easily calculate average bill length by species and sex combinations. ### Step 4: Handle edge cases with convert parameter The `separate()` function can automatically convert column types when specified. ```r # Use convert = TRUE to automatically handle data types auto_converted <- penguin_data |> separate(species_sex_year, into = c("species", "sex", "year"), sep = "-", convert = TRUE) glimpse(auto_converted) ``` The `convert = TRUE` parameter automatically converts the year column to numeric type. ## Summary - Use `separate()` to split single columns containing multiple values into separate columns - Specify column names with the `into` parameter and delimiter with `sep` parameter - The `convert = TRUE` parameter automatically converts data types when appropriate - `separate()` works with any delimiter: underscores, hyphens, commas, or custom patterns - This function is essential for cleaning messy datasets and preparing data for analysis --- ## Related Posts - [How to use separate_wider_delim() in R](/tidyr/how-to-use-separatewiderdelim-in-r.html) - [How to use replace_na() in R](/tidyr/how-to-use-replacena-in-r.html) - [How to use unnest_longer() in R](/tidyr/how-to-use-unnestlonger-in-r.html) - [How to use select() in R](/dplyr/how-to-use-select-in-r.html) - [How to use mutate() in R](/dplyr/how-to-use-mutate-in-r.html)