How to Split a Dataframe into a list of Dataframes by groups in R
In this tutorial, we will learn how to split a dataframe into a list of dataframes by groups in R. We will first learn how to use the base R function, split(), to divide a dataframe into multiple dataframes into a list. Then, we will learn how to use dplyr’s group_split() function to do the same.
To get started, we will first load tidyverse, a suite R packages, and palmer penguins for using the penguins data.
library(tidyverse)
# check the version of loaded package dplyr
packageVersion("dplyr")
## [1] '1.0.8'
library(palmerpenguins)How to Split a Dataframe into a list of Dataframes by groups using split() in base R
split() function in base R divides the data in a vector or a dataframe into a list of groups. Here we show how to split a dataframe by group
list_of_dataframes_by_split %
group_split(species)
##
## island : factor
## bill_length_mm : double
## bill_depth_mm : double
## flipper_length_mm: integer
## body_mass_g : integer
## sex : factor
## year : integer
## >
## >[3]>
## [[1]]
## # A tibble: 152 × 8
## species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
##
## 1 Adelie Torgersen 39.1 18.7 181 3750
## 2 Adelie Torgersen 39.5 17.4 186 3800
## 3 Adelie Torgersen 40.3 18 195 3250
## 4 Adelie Torgersen NA NA NA NA
## 5 Adelie Torgersen 36.7 19.3 193 3450
## 6 Adelie Torgersen 39.3 20.6 190 3650
## 7 Adelie Torgersen 38.9 17.8 181 3625
## 8 Adelie Torgersen 39.2 19.6 195 4675
## 9 Adelie Torgersen 34.1 18.1 193 3475
## 10 Adelie Torgersen 42 20.2 190 4250
## # … with 142 more rows, and 2 more variables: sex , year
##
## [[2]]
## # A tibble: 68 × 8
## species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
##
## 1 Chinstrap Dream 46.5 17.9 192 3500
## 2 Chinstrap Dream 50 19.5 196 3900
## 3 Chinstrap Dream 51.3 19.2 193 3650
## 4 Chinstrap Dream 45.4 18.7 188 3525
## 5 Chinstrap Dream 52.7 19.8 197 3725
## 6 Chinstrap Dream 45.2 17.8 198 3950
## 7 Chinstrap Dream 46.1 18.2 178 3250
## 8 Chinstrap Dream 51.3 18.2 197 3750
## 9 Chinstrap Dream 46 18.9 195 4150
## 10 Chinstrap Dream 51.3 19.9 198 3700
## # … with 58 more rows, and 2 more variables: sex , year
##
## [[3]]
## # A tibble: 124 × 8
## species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
##
## 1 Gentoo Biscoe 46.1 13.2 211 4500
## 2 Gentoo Biscoe 50 16.3 230 5700
## 3 Gentoo Biscoe 48.7 14.1 210 4450
## 4 Gentoo Biscoe 50 15.2 218 5700
## 5 Gentoo Biscoe 47.6 14.5 215 5400
## 6 Gentoo Biscoe 46.5 13.5 210 4550
## 7 Gentoo Biscoe 45.4 14.6 211 4800
## 8 Gentoo Biscoe 46.7 15.3 219 5200
## 9 Gentoo Biscoe 43.3 13.4 209 4400
## 10 Gentoo Biscoe 46.8 15.4 215 5150
## # … with 114 more rows, and 2 more variables: sex , year dplyr’s group_split() function can also work on grouped object, i.e. result from group_by() function in dplyr. For example, here we have grouped object after applying group_by() to the dataframe.
grp_obj %
group_by(species)Then we can split into a list dataframes using group_split() as shown here and we get the same results as before.
grp_obj %>%
group_split()