How to Split a Dataframe into a list of Dataframes by groups in R

dplyr group_split()
split()
Published

April 21, 2022

In this tutorial, we will learn how to split a dataframe into a list of dataframes by groups in R. We will first learn how to use the base R function, split(), to divide a dataframe into multiple dataframes into a list. Then, we will learn how to use dplyr’s group_split() function to do the same.

To get started, we will first load tidyverse, a suite R packages, and palmer penguins for using the penguins data.

library(tidyverse)
# check the version of loaded package dplyr
packageVersion("dplyr")
## [1] '1.0.8'
library(palmerpenguins)

How to Split a Dataframe into a list of Dataframes by groups using split() in base R

split() function in base R divides the data in a vector or a dataframe into a list of groups. Here we show how to split a dataframe by group

list_of_dataframes_by_split % 
  group_split(species)

## 
##     island           : factor
##     bill_length_mm   : double
##     bill_depth_mm    : double
##     flipper_length_mm: integer
##     body_mass_g      : integer
##     sex              : factor
##     year             : integer
##   >
## >[3]>
## [[1]]
## # A tibble: 152 × 8
##    species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
##                                                  
##  1 Adelie  Torgersen           39.1          18.7               181        3750
##  2 Adelie  Torgersen           39.5          17.4               186        3800
##  3 Adelie  Torgersen           40.3          18                 195        3250
##  4 Adelie  Torgersen           NA            NA                  NA          NA
##  5 Adelie  Torgersen           36.7          19.3               193        3450
##  6 Adelie  Torgersen           39.3          20.6               190        3650
##  7 Adelie  Torgersen           38.9          17.8               181        3625
##  8 Adelie  Torgersen           39.2          19.6               195        4675
##  9 Adelie  Torgersen           34.1          18.1               193        3475
## 10 Adelie  Torgersen           42            20.2               190        4250
## # … with 142 more rows, and 2 more variables: sex , year 
## 
## [[2]]
## # A tibble: 68 × 8
##    species   island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
##                                                 
##  1 Chinstrap Dream            46.5          17.9               192        3500
##  2 Chinstrap Dream            50            19.5               196        3900
##  3 Chinstrap Dream            51.3          19.2               193        3650
##  4 Chinstrap Dream            45.4          18.7               188        3525
##  5 Chinstrap Dream            52.7          19.8               197        3725
##  6 Chinstrap Dream            45.2          17.8               198        3950
##  7 Chinstrap Dream            46.1          18.2               178        3250
##  8 Chinstrap Dream            51.3          18.2               197        3750
##  9 Chinstrap Dream            46            18.9               195        4150
## 10 Chinstrap Dream            51.3          19.9               198        3700
## # … with 58 more rows, and 2 more variables: sex , year 
## 
## [[3]]
## # A tibble: 124 × 8
##    species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
##                                               
##  1 Gentoo  Biscoe           46.1          13.2               211        4500
##  2 Gentoo  Biscoe           50            16.3               230        5700
##  3 Gentoo  Biscoe           48.7          14.1               210        4450
##  4 Gentoo  Biscoe           50            15.2               218        5700
##  5 Gentoo  Biscoe           47.6          14.5               215        5400
##  6 Gentoo  Biscoe           46.5          13.5               210        4550
##  7 Gentoo  Biscoe           45.4          14.6               211        4800
##  8 Gentoo  Biscoe           46.7          15.3               219        5200
##  9 Gentoo  Biscoe           43.3          13.4               209        4400
## 10 Gentoo  Biscoe           46.8          15.4               215        5150
## # … with 114 more rows, and 2 more variables: sex , year 

dplyr’s group_split() function can also work on grouped object, i.e. result from group_by() function in dplyr. For example, here we have grouped object after applying group_by() to the dataframe.

grp_obj % 
  group_by(species)

Then we can split into a list dataframes using group_split() as shown here and we get the same results as before.

grp_obj %>%
  group_split()