How to Split a Dataframe into a list of Dataframes by groups in R

dplyr group_split()
split()
Learn how to perform split a dataframe into a list of dataframes by groups in R. Step-by-step statistical tutorial with examples.
Published

April 21, 2022

Introduction

Splitting a dataframe into multiple dataframes by groups is a common data manipulation task in R. This technique is particularly useful when you need to perform different operations on subsets of your data or create separate datasets for analysis. The group_split() function from dplyr makes this process straightforward and efficient.

Getting Started

library(tidyverse)
library(palmerpenguins)

Example 1: Basic Usage

The Problem

We want to split the penguins dataset into separate dataframes for each species. This allows us to analyze each species independently or apply species-specific operations.

Step 1: Examine the Data Structure

Let’s first look at our dataset to understand the grouping variable.

data(penguins)
head(penguins)
table(penguins$species)

This shows us the penguins dataset with three species: Adelie, Chinstrap, and Gentoo.

Step 2: Split by Species

We’ll use group_split() to create separate dataframes for each species.

species_list <- penguins |>
  group_by(species) |>
  group_split()

length(species_list)

This creates a list containing three dataframes, one for each penguin species.

Step 3: Examine the Results

Let’s inspect what we created to verify the split worked correctly.

# Check the first dataframe (Adelie penguins)
head(species_list[[1]])
nrow(species_list[[1]])

# Check species in each dataframe
sapply(species_list, function(x) unique(x$species))

Each list element contains only penguins from one species, confirming our split was successful.

Step 4: Name the List Elements

Adding names to our list makes it easier to access specific groups.

names(species_list) <- c("Adelie", "Chinstrap", "Gentoo")

# Now we can access by name
adelie_penguins <- species_list$Adelie
head(adelie_penguins)

Named list elements provide intuitive access to each species’ data.

Example 2: Practical Application

The Problem

Imagine you’re analyzing car performance data and need to create separate datasets for different cylinder configurations. You want to split the mtcars dataset by cylinder count and perform cylinder-specific analyses. This approach is common when different groups require different modeling approaches or when preparing data for separate reports.

Step 1: Create the Split with Multiple Variables

Let’s split mtcars by both cylinder count and transmission type for more granular analysis.

data(mtcars)
mtcars$am <- factor(mtcars$am, labels = c("automatic", "manual"))

car_groups <- mtcars |>
  group_by(cyl, am) |>
  group_split()

length(car_groups)

This creates separate dataframes for each combination of cylinder count and transmission type.

Step 2: Create Descriptive Names

We’ll generate meaningful names for each group to make our analysis more intuitive.

group_names <- mtcars |>
  group_by(cyl, am) |>
  group_keys() |>
  unite(group_name, cyl, am, sep = "_cyl_") |>
  pull(group_name)

names(car_groups) <- group_names
names(car_groups)

Now each dataframe has a descriptive name indicating its cylinder count and transmission type.

Step 3: Apply Group-Specific Operations

With our named groups, we can easily perform targeted analysis on each subset.

# Calculate mean MPG for each group
mpg_by_group <- map_dbl(car_groups, ~ mean(.x$mpg))
print(mpg_by_group)

# Get summary statistics for 6-cylinder manual cars
summary(car_groups$`6_cyl_manual`)

This demonstrates how split dataframes enable group-specific calculations and summaries.

Step 4: Filter and Re-combine if Needed

Sometimes you’ll want to work with only certain groups or combine them back together.

# Select only manual transmission groups
manual_groups <- car_groups[grepl("manual", names(car_groups))]

# Combine back into single dataframe if needed
manual_cars <- bind_rows(manual_groups, .id = "group")
head(manual_cars)

This flexibility allows you to subset your split data and recombine it as analysis requirements change.

Summary

  • Use group_split() with group_by() to divide dataframes into lists of smaller dataframes
  • The resulting list contains one dataframe for each unique combination of grouping variables
  • Adding meaningful names to list elements improves code readability and data access
  • Split dataframes are ideal for group-specific analyses, modeling, or reporting workflows
  • You can easily filter, modify, or recombine split dataframes using standard R list operations