How to Separate a Column into Multiple Rows in R: Hint tidyr’s spearate_row()

tidyr
tidyr separate_rows()
Published

June 27, 2022

In this tutorial, we will learn how to separate a column with multiple elements separated by a delimiter into different rows. We will use tidy’s separate_rows() function first to separate a column with comma separated elements. And then show how to use separate_rows() on multiple columns. We will also see how to make separate_rows() function to automatically infer the type of updated column.

Let us get started by loading tidyverse and checking the tidyr package version.

library(tidyvrerse)
packageVersion("tidyr")

## [1] '1.2.0'

To illustrate tidyr’s separate_rows(), we create a toy dataframe using tibble() function with a column containing elements that are separated by comma.

df                         
## 1 America   USA,Canada              
## 2 Asia      Singapore               
## 3 Europe    England, France, Germany

tidyr’s separate_rows(): Example with one column

When a column contains elements with one or more delimited values, separate_rows() function separates the values in the column and places each one in its own row.

seprate_rows() function takes one or more columns that have elements separated by a delimiter. When we don’t specify a column, it does nothing as expected.

df %>% 
   separate_rows()

## # A tibble: 3 × 2
##   continent country                 
##                           
## 1 America   USA,Canada              
## 2 Asia      Singapore               
## 3 Europe    England, France, Germany

In our example, the country column contain elements separated by comma. By using the collapsed column as argument to separate_rows() function we get expanded dataframe with separte row for each collapsed element.

df %>% 
   separate_rows(country)

## # A tibble: 6 × 2
##   continent country  
##            
## 1 America   USA      
## 2 America   Canada   
## 3 Asia      Singapore
## 4 Europe    England  
## 5 Europe    France   
## 6 Europe    Germany

tidyr’s separate_rows(): Example with multiple columns

In the previous example, we had just one column with collapsed elements. To illustrate separate_rows() can work with multiple columns, we create an updated tibble with two columns that have comma separated elements.

df                             
## 1 America   USA,Canada               70,72   
## 2 Asia      Singapore                68      
## 3 Eurpoe    England, France, Germany 75,76,77

By providing both the columns that needed to be separated into rows as arguments, we get a tibble like this

df %>% 
  separate_rows(country, lifeExp)

## # A tibble: 6 × 3
##   continent country   lifeExp
##               
## 1 America   USA       70     
## 2 America   Canada    72     
## 3 Asia      Singapore 68     
## 4 Eurpoe    England   75     
## 5 Eurpoe    France    76     
## 6 Eurpoe    Germany   77

tidyr’s separate_rows() Example with inferring datatype

Note that in the above example, the second collapsed column is of numerical type. We can instruct separate_rows() function to convert column to appropriate data type after separating by using convert=TRUE>. Now our lifeExp column is int instead of character.

df %>% 
  separate_rows(country, lifeExp, convert=TRUE)

## # A tibble: 6 × 3
##   continent country   lifeExp
##               
## 1 America   USA            70
## 2 America   Canada         72
## 3 Asia      Singapore      68
## 4 Eurpoe    England        75
## 5 Eurpoe    France         76
## 6 Eurpoe    Germany        77

tidyr’s separate_rows() Example sep argument

separate_rows() function is pretty decent in detecting the commonly used delimiters. Here is example where each element is separated by two under scores, an uncommon delimiter.

df                               
## 1 America   USA__Canada              70__72    
## 2 Asia      Singapore                68        
## 3 Eurpoe    England__France__Germany 75__76__77

Even the the delimiter that we used is not common, tidyr’s separate_rows() nicely detected them and separated into rows.

df %>% 
  separate_rows(country, lifeExp,convert=TRUE)

## # A tibble: 6 × 3
##   continent country   lifeExp
##               
## 1 America   USA            70
## 2 America   Canada         72
## 3 Asia      Singapore      68
## 4 Eurpoe    England        75
## 5 Eurpoe    France         76
## 6 Eurpoe    Germany        77

With “sep” argument, we can specify a delimiter to separate into rows with separate_rows().