How to use unnest_wider() in R
Introduction
The unnest_wider() function from the tidyr package transforms list-columns containing named elements into separate columns. This is particularly useful when working with nested data structures like JSON data or API responses where you need to spread list elements horizontally across multiple columns.
Getting Started
library(tidyverse)
library(palmerpenguins)Example 1: Basic Usage
The Problem
You have a tibble with a list-column where each list contains named elements, and you want to convert those named elements into separate columns. This commonly occurs when parsing JSON data or working with nested data structures.
Step 1: Create sample data with list-column
First, let’s create a simple dataset that mimics the structure you might encounter.
# Create a tibble with a list-column containing named elements
sample_data <- tibble(
id = c("A", "B", "C"),
measurements = list(
list(height = 180, weight = 75),
list(height = 165, weight = 60),
list(height = 175, weight = 70)
)
)This creates a tibble where the measurements column contains lists with named elements height and weight.
Step 2: Apply unnest_wider()
Now we’ll use unnest_wider() to spread the list elements into separate columns.
# Unnest the measurements column wider
result <- sample_data |>
unnest_wider(measurements)
print(result)The function creates two new columns (height and weight) from the named elements in the list-column, removing the original list-column.
Step 3: Examine the structure
Let’s verify that our data is now in a standard tabular format.
# Check the structure of the result
str(result)
glimpse(result)The data is now properly structured with each measurement as its own column, making it ready for analysis and visualization.
Example 2: Practical Application
The Problem
Imagine you’re working with survey data where responses are stored as nested lists, similar to how data might come from a web API. You need to flatten this structure to analyze individual response components across different survey questions.
Step 1: Create realistic survey data
Let’s create a dataset that represents survey responses with nested information.
# Create survey data with nested responses
survey_data <- tibble(
participant_id = 1:4,
demographics = list(
list(age = 25, education = "Bachelor", income = 50000),
list(age = 34, education = "Master", income = 75000),
list(age = 28, education = "PhD", income = 85000),
list(age = 31, education = "Bachelor", income = 60000)
)
)
print(survey_data)This structure mimics what you might receive from an API where demographic information is nested within each participant’s record.
Step 2: Unnest demographic information
Now we’ll flatten the demographic data into separate columns for easier analysis.
# Unnest demographics into separate columns
flattened_survey <- survey_data |>
unnest_wider(demographics)
print(flattened_survey)Each demographic attribute now has its own column, making the data much easier to work with for statistical analysis.
Step 3: Perform analysis on flattened data
With the data properly structured, we can now easily calculate summary statistics.
# Calculate summary statistics
flattened_survey |>
summarise(
mean_age = mean(age),
mean_income = mean(income),
.by = education
)The flattened structure allows us to group by education level and calculate meaningful statistics across our demographic variables.
Step 4: Handle missing or inconsistent data
Real-world data often has missing or inconsistent nested elements.
# Create data with inconsistent list elements
messy_data <- tibble(
id = 1:3,
info = list(
list(name = "John", score = 95),
list(name = "Jane", score = 87, bonus = 5),
list(name = "Bob")
)
)
# Unnest with automatic handling of missing elements
messy_data |>
unnest_wider(info)The function automatically creates columns for all unique names found across all lists, filling missing values with NA.
Summary
unnest_wider()transforms list-columns with named elements into separate columns, spreading data horizontally- It’s essential for working with nested data structures from APIs, JSON files, or complex data collection processes
- The function automatically handles missing elements by creating
NAvalues in the resulting columns - Use it when you need to convert nested, named list elements into a standard tabular format for analysis
Always verify your results with
glimpse()orstr()to ensure the unnesting worked as expected