How to Split a Dataframe into a list of Dataframes by groups in R

dplyr group_split()

split()

Published

April 21, 2022

In this tutorial, we will learn how to split a dataframe into a list of dataframes by groups in R. We will first learn how to use the base R function, split(), to divide a dataframe into multiple dataframes into a list. Then, we will learn how to use dplyr’s group_split() function to do the same.

To get started, we will first load tidyverse, a suite R packages, and palmer penguins for using the penguins data.

library(tidyverse)
# check the version of loaded package dplyr
packageVersion("dplyr")
## [1] '1.0.8'
library(palmerpenguins)

How to Split a Dataframe into a list of Dataframes by groups using split() in base R

split() function in base R divides the data in a vector or a dataframe into a list of groups. Here we show how to split a dataframe by group

list_of_dataframes_by_split % 
  group_split(species)

## 
##     island           : factor
##     bill_length_mm   : double
##     bill_depth_mm    : double
##     flipper_length_mm: integer
##     body_mass_g      : integer
##     sex              : factor
##     year             : integer
##   >
## >[3]>
## [[1]]
## # A tibble: 152 × 8
##    species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
##                                                  
##  1 Adelie  Torgersen           39.1          18.7               181        3750
##  2 Adelie  Torgersen           39.5          17.4               186        3800
##  3 Adelie  Torgersen           40.3          18                 195        3250
##  4 Adelie  Torgersen           NA            NA                  NA          NA
##  5 Adelie  Torgersen           36.7          19.3               193        3450
##  6 Adelie  Torgersen           39.3          20.6               190        3650
##  7 Adelie  Torgersen           38.9          17.8               181        3625
##  8 Adelie  Torgersen           39.2          19.6               195        4675
##  9 Adelie  Torgersen           34.1          18.1               193        3475
## 10 Adelie  Torgersen           42            20.2               190        4250
## # … with 142 more rows, and 2 more variables: sex , year 
## 
## [[2]]
## # A tibble: 68 × 8
##    species   island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
##                                                 
##  1 Chinstrap Dream            46.5          17.9               192        3500
##  2 Chinstrap Dream            50            19.5               196        3900
##  3 Chinstrap Dream            51.3          19.2               193        3650
##  4 Chinstrap Dream            45.4          18.7               188        3525
##  5 Chinstrap Dream            52.7          19.8               197        3725
##  6 Chinstrap Dream            45.2          17.8               198        3950
##  7 Chinstrap Dream            46.1          18.2               178        3250
##  8 Chinstrap Dream            51.3          18.2               197        3750
##  9 Chinstrap Dream            46            18.9               195        4150
## 10 Chinstrap Dream            51.3          19.9               198        3700
## # … with 58 more rows, and 2 more variables: sex , year 
## 
## [[3]]
## # A tibble: 124 × 8
##    species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
##                                               
##  1 Gentoo  Biscoe           46.1          13.2               211        4500
##  2 Gentoo  Biscoe           50            16.3               230        5700
##  3 Gentoo  Biscoe           48.7          14.1               210        4450
##  4 Gentoo  Biscoe           50            15.2               218        5700
##  5 Gentoo  Biscoe           47.6          14.5               215        5400
##  6 Gentoo  Biscoe           46.5          13.5               210        4550
##  7 Gentoo  Biscoe           45.4          14.6               211        4800
##  8 Gentoo  Biscoe           46.7          15.3               219        5200
##  9 Gentoo  Biscoe           43.3          13.4               209        4400
## 10 Gentoo  Biscoe           46.8          15.4               215        5150
## # … with 114 more rows, and 2 more variables: sex , year

dplyr’s group_split() function can also work on grouped object, i.e. result from group_by() function in dplyr. For example, here we have grouped object after applying group_by() to the dataframe.

grp_obj % 
  group_by(species)

Then we can split into a list dataframes using group_split() as shown here and we get the same results as before.

grp_obj %>%
  group_split()

--- title: "How to Split a Dataframe into a list of Dataframes by groups in R" date: 2022-04-21 categories: ['dplyr group_split()', 'split()'] format: html: code-fold: false code-tools: true --- In this tutorial, we will learn how to split a dataframe into a list of dataframes by groups in R. We will first learn how to use the base R function, split(), to divide a dataframe into multiple dataframes into a list. Then, we will learn how to use dplyr's group_split() function to do the same. To get started, we will first load tidyverse, a suite R packages, and palmer penguins for using the penguins data. ```r library(tidyverse) # check the version of loaded package dplyr packageVersion("dplyr") ## [1] '1.0.8' library(palmerpenguins) ``` ### How to Split a Dataframe into a list of Dataframes by groups using split() in base R split() function in base R divides the data in a vector or a dataframe into a list of groups. Here we show how to split a dataframe by group ```r list_of_dataframes_by_split % group_split(species) ## ## island : factor ## bill_length_mm : double ## bill_depth_mm : double ## flipper_length_mm: integer ## body_mass_g : integer ## sex : factor ## year : integer ## > ## >[3]> ## [[1]] ## # A tibble: 152 × 8 ## species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g ## ## 1 Adelie Torgersen 39.1 18.7 181 3750 ## 2 Adelie Torgersen 39.5 17.4 186 3800 ## 3 Adelie Torgersen 40.3 18 195 3250 ## 4 Adelie Torgersen NA NA NA NA ## 5 Adelie Torgersen 36.7 19.3 193 3450 ## 6 Adelie Torgersen 39.3 20.6 190 3650 ## 7 Adelie Torgersen 38.9 17.8 181 3625 ## 8 Adelie Torgersen 39.2 19.6 195 4675 ## 9 Adelie Torgersen 34.1 18.1 193 3475 ## 10 Adelie Torgersen 42 20.2 190 4250 ## # … with 142 more rows, and 2 more variables: sex , year ## ## [[2]] ## # A tibble: 68 × 8 ## species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g ## ## 1 Chinstrap Dream 46.5 17.9 192 3500 ## 2 Chinstrap Dream 50 19.5 196 3900 ## 3 Chinstrap Dream 51.3 19.2 193 3650 ## 4 Chinstrap Dream 45.4 18.7 188 3525 ## 5 Chinstrap Dream 52.7 19.8 197 3725 ## 6 Chinstrap Dream 45.2 17.8 198 3950 ## 7 Chinstrap Dream 46.1 18.2 178 3250 ## 8 Chinstrap Dream 51.3 18.2 197 3750 ## 9 Chinstrap Dream 46 18.9 195 4150 ## 10 Chinstrap Dream 51.3 19.9 198 3700 ## # … with 58 more rows, and 2 more variables: sex , year ## ## [[3]] ## # A tibble: 124 × 8 ## species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g ## ## 1 Gentoo Biscoe 46.1 13.2 211 4500 ## 2 Gentoo Biscoe 50 16.3 230 5700 ## 3 Gentoo Biscoe 48.7 14.1 210 4450 ## 4 Gentoo Biscoe 50 15.2 218 5700 ## 5 Gentoo Biscoe 47.6 14.5 215 5400 ## 6 Gentoo Biscoe 46.5 13.5 210 4550 ## 7 Gentoo Biscoe 45.4 14.6 211 4800 ## 8 Gentoo Biscoe 46.7 15.3 219 5200 ## 9 Gentoo Biscoe 43.3 13.4 209 4400 ## 10 Gentoo Biscoe 46.8 15.4 215 5150 ## # … with 114 more rows, and 2 more variables: sex , year ``` dplyr's group_split() function can also work on grouped object, i.e. result from group_by() function in dplyr. For example, here we have grouped object after applying group_by() to the dataframe. ```r grp_obj % group_by(species) ``` Then we can split into a list dataframes using group_split() as shown here and we get the same results as before. ```r grp_obj %>% group_split() ```