How to use pivot_wider() in R
Introduction
The pivot_wider() function from the tidyr package transforms long data into wide format by spreading values across multiple columns. This is essential when you need to reshape data for analysis, create summary tables, or prepare data for specific visualization requirements.
Getting Started
library(tidyverse)
library(palmerpenguins)Example 1: Basic Usage
The Problem
We have penguin data in long format where species and measurements are in separate rows. We want to create a wide format where each species becomes its own column with corresponding values.
Step 1: Create sample long data
First, let’s create a simple dataset to demonstrate the basic concept.
penguin_summary <- penguins |>
drop_na() |>
group_by(species, island) |>
summarise(avg_mass = mean(body_mass_g), .groups = "drop") |>
slice_head(n = 6)
penguin_summaryThis creates a summary table with species, island, and average body mass in long format.
Step 2: Apply basic pivot_wider
Now we’ll transform this data so each species becomes a separate column.
wide_penguins <- penguin_summary |>
pivot_wider(
names_from = species,
values_from = avg_mass
)
wide_penguinsThe data is now restructured with species as column headers and their corresponding average masses as values.
Step 3: Handle missing values
We can specify how to handle missing combinations using the values_fill parameter.
wide_penguins_filled <- penguin_summary |>
pivot_wider(
names_from = species,
values_from = avg_mass,
values_fill = 0
)
wide_penguins_filledAny missing species-island combinations are now filled with zeros instead of NA values.
Example 2: Practical Application
The Problem
You’re analyzing car performance data and need to compare fuel efficiency (mpg) across different cylinder counts for each car manufacturer. The current format makes it difficult to see these comparisons side-by-side.
Step 1: Prepare the automotive data
Let’s create a meaningful summary from the mtcars dataset for our analysis.
# Add car names and create manufacturer column
cars_data <- mtcars |>
rownames_to_column("car_name") |>
mutate(manufacturer = word(car_name, 1)) |>
select(manufacturer, cyl, mpg) |>
group_by(manufacturer, cyl) |>
summarise(avg_mpg = round(mean(mpg), 1), .groups = "drop")
cars_dataThis creates a summary showing average fuel efficiency by manufacturer and cylinder count.
Step 2: Transform to comparison format
Now we’ll pivot the data to compare fuel efficiency across cylinder counts.
mpg_comparison <- cars_data |>
pivot_wider(
names_from = cyl,
values_from = avg_mpg,
names_prefix = "cyl_"
)
mpg_comparisonEach cylinder count is now a separate column, making it easy to compare fuel efficiency across engine types.
Step 3: Create multiple value columns
We can also pivot multiple measurements simultaneously for comprehensive analysis.
detailed_cars <- mtcars |>
rownames_to_column("car_name") |>
mutate(manufacturer = word(car_name, 1)) |>
group_by(manufacturer, cyl) |>
summarise(
avg_mpg = round(mean(mpg), 1),
avg_hp = round(mean(hp), 0),
.groups = "drop"
) |>
pivot_wider(
names_from = cyl,
values_from = c(avg_mpg, avg_hp),
names_sep = "_cyl_"
)
detailed_carsThis creates columns for both MPG and horsepower across different cylinder counts, providing a comprehensive comparison matrix.
Summary
pivot_wider()transforms long data into wide format by spreading key-value pairs across columns- Use
names_fromto specify which column contains the new column names - Use
values_fromto specify which column contains the values to spread - The
values_fillparameter handles missing combinations with specified default values Multiple value columns can be pivoted simultaneously using vectors in
values_from