How to use unnest() in R
Introduction
The unnest() function from tidyr is used to expand list-columns in a data frame, converting nested data structures into a flat, rectangular format. This is particularly useful when working with data that contains lists within cells, such as results from web APIs or grouped analyses.
Getting Started
library(tidyverse)
library(palmerpenguins)Example 1: Basic Usage
The Problem
Sometimes we have data frames where one or more columns contain lists instead of individual values. We need to expand these list-columns so each list element becomes its own row.
Step 1: Create nested data
We’ll start by creating a simple data frame with a list-column to demonstrate the concept.
# Create a data frame with list-columns
nested_df <- tibble(
group = c("A", "B", "C"),
values = list(c(1, 2, 3), c(4, 5), c(6, 7, 8, 9))
)
print(nested_df)This creates a data frame where the values column contains lists of different lengths.
Step 2: Apply unnest()
Now we’ll use unnest() to expand the list-column into individual rows.
# Unnest the values column
unnested_df <- nested_df |>
unnest(values)
print(unnested_df)The result is a longer data frame where each list element becomes its own row, with the group identifier repeated as needed.
Step 3: Verify the transformation
Let’s check the dimensions to see how the data changed.
# Compare dimensions
cat("Original rows:", nrow(nested_df), "\n")
cat("Unnested rows:", nrow(unnested_df), "\n")
cat("Total list elements:", sum(lengths(nested_df$values)))The unnested data frame has 8 rows, matching the total number of elements across all lists.
Example 2: Practical Application
The Problem
In real data analysis, we often group data and perform operations that return multiple values per group. For example, let’s find the top 2 heaviest penguins by species and then unnest the results for further analysis.
Step 1: Create nested analysis results
We’ll group penguins by species and nest the top 2 heaviest penguins from each group.
# Group and nest top penguins by body mass
nested_penguins <- penguins |>
filter(!is.na(body_mass_g)) |>
group_by(species) |>
slice_max(body_mass_g, n = 2) |>
nest()This creates a data frame with one row per species and a nested data column containing the top penguins.
Step 2: Examine the nested structure
Let’s look at what we created before unnesting.
# Check the nested structure
print(nested_penguins)
nested_penguins$data[[1]] # Look at first nested datasetEach row contains a tibble with the top 2 penguins for that species stored in the data column.
Step 3: Unnest the results
Now we’ll unnest to get back to a flat structure with all top penguins.
# Unnest to flatten the structure
top_penguins <- nested_penguins |>
unnest(data)
print(top_penguins)We now have a regular data frame with 6 rows (2 penguins × 3 species) containing all the heaviest penguins.
Step 4: Work with multiple list-columns
Sometimes you have multiple list-columns to unnest simultaneously.
# Create data with multiple list-columns
multi_nested <- tibble(
id = 1:3,
letters = list(c("a", "b"), c("c"), c("d", "e", "f")),
numbers = list(c(10, 20), c(30), c(40, 50, 60))
)
# Unnest multiple columns at once
multi_unnested <- multi_nested |>
unnest(c(letters, numbers))
print(multi_unnested)Both list-columns are unnested simultaneously, creating aligned rows for each position.
Summary
unnest()expands list-columns in data frames, creating new rows for each list element- It’s essential for flattening nested data structures from APIs, grouped analyses, or complex data manipulations
- You can unnest multiple list-columns simultaneously by passing a vector of column names
- The function automatically repeats values from other columns to maintain data relationships
Always check your data dimensions before and after unnesting to ensure the transformation worked as expected