How to Separate a Column into Multiple Rows in R: Hint tidyr’s spearate_row()

tidyr

tidyr separate_rows()

Published

June 27, 2022

In this tutorial, we will learn how to separate a column with multiple elements separated by a delimiter into different rows. We will use tidy’s separate_rows() function first to separate a column with comma separated elements. And then show how to use separate_rows() on multiple columns. We will also see how to make separate_rows() function to automatically infer the type of updated column.

Let us get started by loading tidyverse and checking the tidyr package version.

library(tidyvrerse)
packageVersion("tidyr")

## [1] '1.2.0'

To illustrate tidyr’s separate_rows(), we create a toy dataframe using tibble() function with a column containing elements that are separated by comma.

df                         
## 1 America   USA,Canada              
## 2 Asia      Singapore               
## 3 Europe    England, France, Germany

tidyr’s separate_rows(): Example with one column

When a column contains elements with one or more delimited values, separate_rows() function separates the values in the column and places each one in its own row.

seprate_rows() function takes one or more columns that have elements separated by a delimiter. When we don’t specify a column, it does nothing as expected.

df %>% 
   separate_rows()

## # A tibble: 3 × 2
##   continent country                 
##                           
## 1 America   USA,Canada              
## 2 Asia      Singapore               
## 3 Europe    England, France, Germany

In our example, the country column contain elements separated by comma. By using the collapsed column as argument to separate_rows() function we get expanded dataframe with separte row for each collapsed element.

df %>% 
   separate_rows(country)

## # A tibble: 6 × 2
##   continent country  
##            
## 1 America   USA      
## 2 America   Canada   
## 3 Asia      Singapore
## 4 Europe    England  
## 5 Europe    France   
## 6 Europe    Germany

tidyr’s separate_rows(): Example with multiple columns

In the previous example, we had just one column with collapsed elements. To illustrate separate_rows() can work with multiple columns, we create an updated tibble with two columns that have comma separated elements.

df                             
## 1 America   USA,Canada               70,72   
## 2 Asia      Singapore                68      
## 3 Eurpoe    England, France, Germany 75,76,77

By providing both the columns that needed to be separated into rows as arguments, we get a tibble like this

df %>% 
  separate_rows(country, lifeExp)

## # A tibble: 6 × 3
##   continent country   lifeExp
##               
## 1 America   USA       70     
## 2 America   Canada    72     
## 3 Asia      Singapore 68     
## 4 Eurpoe    England   75     
## 5 Eurpoe    France    76     
## 6 Eurpoe    Germany   77

tidyr’s separate_rows() Example with inferring datatype

Note that in the above example, the second collapsed column is of numerical type. We can instruct separate_rows() function to convert column to appropriate data type after separating by using convert=TRUE>. Now our lifeExp column is int instead of character.

df %>% 
  separate_rows(country, lifeExp, convert=TRUE)

## # A tibble: 6 × 3
##   continent country   lifeExp
##               
## 1 America   USA            70
## 2 America   Canada         72
## 3 Asia      Singapore      68
## 4 Eurpoe    England        75
## 5 Eurpoe    France         76
## 6 Eurpoe    Germany        77

tidyr’s separate_rows() Example sep argument

separate_rows() function is pretty decent in detecting the commonly used delimiters. Here is example where each element is separated by two under scores, an uncommon delimiter.

df                               
## 1 America   USA__Canada              70__72    
## 2 Asia      Singapore                68        
## 3 Eurpoe    England__France__Germany 75__76__77

Even the the delimiter that we used is not common, tidyr’s separate_rows() nicely detected them and separated into rows.

df %>% 
  separate_rows(country, lifeExp,convert=TRUE)

## # A tibble: 6 × 3
##   continent country   lifeExp
##               
## 1 America   USA            70
## 2 America   Canada         72
## 3 Asia      Singapore      68
## 4 Eurpoe    England        75
## 5 Eurpoe    France         76
## 6 Eurpoe    Germany        77

With “sep” argument, we can specify a delimiter to separate into rows with separate_rows().

--- title: "How to Separate a Column into Multiple Rows in R: Hint tidyr's spearate_row()" date: 2022-06-27 categories: ['tidyr', 'tidyr separate_rows()'] format: html: code-fold: false code-tools: true --- In this tutorial, we will learn how to separate a column with multiple elements separated by a delimiter into different rows. We will use tidy's separate_rows() function first to separate a column with comma separated elements. And then show how to use separate_rows() on multiple columns. We will also see how to make separate_rows() function to automatically infer the type of updated column. Let us get started by loading tidyverse and checking the tidyr package version. ```r library(tidyvrerse) packageVersion("tidyr") ## [1] '1.2.0' ``` To illustrate tidyr's separate_rows(), we create a toy dataframe using tibble() function with a column containing elements that are separated by comma. ```r df ## 1 America USA,Canada ## 2 Asia Singapore ## 3 Europe England, France, Germany ``` ### tidyr's separate_rows(): Example with one column When a column contains elements with one or more delimited values, separate_rows() function separates the values in the column and places each one in its own row. seprate_rows() function takes one or more columns that have elements separated by a delimiter. When we don't specify a column, it does nothing as expected. ```r df %>% separate_rows() ## # A tibble: 3 × 2 ## continent country ## ## 1 America USA,Canada ## 2 Asia Singapore ## 3 Europe England, France, Germany ``` In our example, the country column contain elements separated by comma. By using the collapsed column as argument to separate_rows() function we get expanded dataframe with separte row for each collapsed element. ```r df %>% separate_rows(country) ## # A tibble: 6 × 2 ## continent country ## ## 1 America USA ## 2 America Canada ## 3 Asia Singapore ## 4 Europe England ## 5 Europe France ## 6 Europe Germany ``` ### tidyr's separate_rows(): Example with multiple columns In the previous example, we had just one column with collapsed elements. To illustrate separate_rows() can work with multiple columns, we create an updated tibble with two columns that have comma separated elements. ```r df ## 1 America USA,Canada 70,72 ## 2 Asia Singapore 68 ## 3 Eurpoe England, France, Germany 75,76,77 ``` By providing both the columns that needed to be separated into rows as arguments, we get a tibble like this ```r df %>% separate_rows(country, lifeExp) ## # A tibble: 6 × 3 ## continent country lifeExp ## ## 1 America USA 70 ## 2 America Canada 72 ## 3 Asia Singapore 68 ## 4 Eurpoe England 75 ## 5 Eurpoe France 76 ## 6 Eurpoe Germany 77 ``` ### tidyr's separate_rows() Example with inferring datatype Note that in the above example, the second collapsed column is of numerical type. We can instruct separate_rows() function to convert column to appropriate data type after separating by using convert=TRUE>. Now our lifeExp column is int instead of character. ```r df %>% separate_rows(country, lifeExp, convert=TRUE) ## # A tibble: 6 × 3 ## continent country lifeExp ## ## 1 America USA 70 ## 2 America Canada 72 ## 3 Asia Singapore 68 ## 4 Eurpoe England 75 ## 5 Eurpoe France 76 ## 6 Eurpoe Germany 77 ``` ### tidyr's separate_rows() Example sep argument separate_rows() function is pretty decent in detecting the commonly used delimiters. Here is example where each element is separated by two under scores, an uncommon delimiter. ```r df ## 1 America USA__Canada 70__72 ## 2 Asia Singapore 68 ## 3 Eurpoe England__France__Germany 75__76__77 ``` Even the the delimiter that we used is not common, tidyr's separate_rows() nicely detected them and separated into rows. ```r df %>% separate_rows(country, lifeExp,convert=TRUE) ## # A tibble: 6 × 3 ## continent country lifeExp ## ## 1 America USA 70 ## 2 America Canada 72 ## 3 Asia Singapore 68 ## 4 Eurpoe England 75 ## 5 Eurpoe France 76 ## 6 Eurpoe Germany 77 ``` With "sep" argument, we can specify a delimiter to separate into rows with separate_rows().