How to use unnest_longer() in R
Introduction
The unnest_longer() function from the tidyr package transforms list-columns into longer format by creating one row for each element in the list. This is particularly useful when you have data stored as lists within data frame cells and need to expand them for analysis or visualization.
Getting Started
library(tidyverse)
library(palmerpenguins)Example 1: Basic Usage
The Problem
We often receive data where multiple values are stored as lists within single cells, making analysis difficult. Let’s create a simple dataset with list-columns to demonstrate how unnest_longer() solves this issue.
Step 1: Create sample data with list-columns
We’ll start by building a data frame containing list-columns.
# Create a tibble with list-columns
sample_data <- tibble(
group = c("A", "B", "C"),
values = list(c(1, 2, 3), c(4, 5), c(6, 7, 8, 9)),
colors = list(c("red", "blue"), c("green"), c("yellow", "purple", "orange"))
)
print(sample_data)This creates a compact data frame where each row contains multiple values stored as lists.
Step 2: Apply unnest_longer() to expand values
Now we’ll expand the values column to create individual rows.
# Unnest the values column
expanded_values <- sample_data |>
unnest_longer(values)
print(expanded_values)Each list element now occupies its own row, while other columns are duplicated to maintain relationships.
Step 3: Expand multiple list-columns simultaneously
We can unnest multiple columns at once by specifying additional column names.
# Unnest both values and colors columns
fully_expanded <- sample_data |>
unnest_longer(c(values, colors))
print(fully_expanded)Both list-columns are now expanded, creating all possible combinations between the lists in each row.
Example 2: Practical Application
The Problem
Imagine we’re analyzing penguin measurements where each penguin has multiple recorded weights over time. Our data arrives with all weights stored as lists, but we need individual observations for trend analysis.
Step 1: Create realistic penguin weight tracking data
Let’s simulate a dataset that mimics real-world data collection scenarios.
# Create penguin weight tracking data
penguin_weights <- tibble(
penguin_id = c("P001", "P002", "P003"),
species = c("Adelie", "Chinstrap", "Gentoo"),
weights_kg = list(
c(3.2, 3.4, 3.3, 3.5),
c(4.1, 4.2),
c(5.2, 5.0, 5.3, 5.1, 5.4)
),
measurement_dates = list(
c("2023-01", "2023-02", "2023-03", "2023-04"),
c("2023-01", "2023-02"),
c("2023-01", "2023-02", "2023-03", "2023-04", "2023-05")
)
)
print(penguin_weights)This structure represents how data might arrive from multiple measurement sessions per penguin.
Step 2: Expand the weight measurements
We’ll unnest the weight data to create individual measurement records.
# Unnest weights and dates together
individual_measurements <- penguin_weights |>
unnest_longer(c(weights_kg, measurement_dates))
print(individual_measurements)Now each weight measurement has its own row with the corresponding date, making time-series analysis possible.
Step 3: Add measurement sequence numbers
Let’s enhance our data by adding sequence information for each penguin’s measurements.
# Add measurement sequence using unnest_longer with indices
measurements_with_sequence <- penguin_weights |>
unnest_longer(weights_kg, indices_to = "measurement_number") |>
unnest_longer(measurement_dates)
print(measurements_with_sequence)The indices_to parameter creates a new column showing the position of each element in the original list.
Step 4: Calculate weight changes over time
Now we can perform meaningful analysis on the expanded data.
# Calculate weight changes for each penguin
weight_changes <- measurements_with_sequence |>
group_by(penguin_id) |>
mutate(weight_change = weights_kg - lag(weights_kg)) |>
filter(!is.na(weight_change))
print(weight_changes)With properly structured data, we can easily calculate weight changes between consecutive measurements.
Summary
unnest_longer()transforms list-columns into individual rows, expanding your dataset vertically- Multiple list-columns can be unnested simultaneously, creating all combinations of list elements
- The
indices_toparameter preserves the original position of elements within lists - This function is essential for converting compact, nested data into analysis-ready format
Always ensure your list-columns have compatible lengths when unnesting multiple columns together