How to Separate a Column into Multiple Rows in R: Hint tidyr’s spearate_row()
In this tutorial, we will learn how to separate a column with multiple elements separated by a delimiter into different rows. We will use tidy’s separate_rows() function first to separate a column with comma separated elements. And then show how to use separate_rows() on multiple columns. We will also see how to make separate_rows() function to automatically infer the type of updated column.
Let us get started by loading tidyverse and checking the tidyr package version.
library(tidyvrerse)
packageVersion("tidyr")
## [1] '1.2.0'To illustrate tidyr’s separate_rows(), we create a toy dataframe using tibble() function with a column containing elements that are separated by comma.
df
## 1 America USA,Canada
## 2 Asia Singapore
## 3 Europe England, France, Germanytidyr’s separate_rows(): Example with one column
When a column contains elements with one or more delimited values, separate_rows() function separates the values in the column and places each one in its own row.
seprate_rows() function takes one or more columns that have elements separated by a delimiter. When we don’t specify a column, it does nothing as expected.
df %>%
separate_rows()
## # A tibble: 3 × 2
## continent country
##
## 1 America USA,Canada
## 2 Asia Singapore
## 3 Europe England, France, GermanyIn our example, the country column contain elements separated by comma. By using the collapsed column as argument to separate_rows() function we get expanded dataframe with separte row for each collapsed element.
df %>%
separate_rows(country)
## # A tibble: 6 × 2
## continent country
##
## 1 America USA
## 2 America Canada
## 3 Asia Singapore
## 4 Europe England
## 5 Europe France
## 6 Europe Germanytidyr’s separate_rows(): Example with multiple columns
In the previous example, we had just one column with collapsed elements. To illustrate separate_rows() can work with multiple columns, we create an updated tibble with two columns that have comma separated elements.
df
## 1 America USA,Canada 70,72
## 2 Asia Singapore 68
## 3 Eurpoe England, France, Germany 75,76,77By providing both the columns that needed to be separated into rows as arguments, we get a tibble like this
df %>%
separate_rows(country, lifeExp)
## # A tibble: 6 × 3
## continent country lifeExp
##
## 1 America USA 70
## 2 America Canada 72
## 3 Asia Singapore 68
## 4 Eurpoe England 75
## 5 Eurpoe France 76
## 6 Eurpoe Germany 77tidyr’s separate_rows() Example with inferring datatype
Note that in the above example, the second collapsed column is of numerical type. We can instruct separate_rows() function to convert column to appropriate data type after separating by using convert=TRUE>. Now our lifeExp column is int instead of character.
df %>%
separate_rows(country, lifeExp, convert=TRUE)
## # A tibble: 6 × 3
## continent country lifeExp
##
## 1 America USA 70
## 2 America Canada 72
## 3 Asia Singapore 68
## 4 Eurpoe England 75
## 5 Eurpoe France 76
## 6 Eurpoe Germany 77tidyr’s separate_rows() Example sep argument
separate_rows() function is pretty decent in detecting the commonly used delimiters. Here is example where each element is separated by two under scores, an uncommon delimiter.
df
## 1 America USA__Canada 70__72
## 2 Asia Singapore 68
## 3 Eurpoe England__France__Germany 75__76__77Even the the delimiter that we used is not common, tidyr’s separate_rows() nicely detected them and separated into rows.
df %>%
separate_rows(country, lifeExp,convert=TRUE)
## # A tibble: 6 × 3
## continent country lifeExp
##
## 1 America USA 70
## 2 America Canada 72
## 3 Asia Singapore 68
## 4 Eurpoe England 75
## 5 Eurpoe France 76
## 6 Eurpoe Germany 77With “sep” argument, we can specify a delimiter to separate into rows with separate_rows().