Many linear regression models with tidyverse
Introduction
When you need to fit the same linear regression model to different subsets of your data, the tidyverse provides powerful tools to streamline this process. This approach is particularly useful when analyzing grouped data where you want to compare model coefficients, predictions, or performance metrics across categories.
Getting Started
library(tidyverse)
library(palmerpenguins)
library(broom)Example 1: Basic Usage
The Problem
We want to examine the relationship between bill length and bill depth for each penguin species separately. Instead of manually filtering data and fitting three separate models, we’ll use tidyverse functions to automate this process.
Step 1: Nest data by groups
We’ll organize our data so each species gets its own dataset.
penguin_nested <- penguins |>
filter(!is.na(bill_length_mm), !is.na(bill_depth_mm)) |>
group_by(species) |>
nest()
penguin_nestedThis creates a tibble where each row represents a species, and the data column contains the subset of observations for that species.
Step 2: Fit models to each group
We’ll apply the same linear model to each nested dataset.
penguin_models <- penguin_nested |>
mutate(
model = map(data, ~ lm(bill_depth_mm ~ bill_length_mm, data = .x))
)
penguin_modelsNow each species has its own fitted linear model stored in the model column.
Step 3: Extract model summaries
We’ll use broom functions to get clean model outputs.
model_summaries <- penguin_models |>
mutate(
tidy_model = map(model, tidy),
glance_model = map(model, glance)
)
model_summariesThe tidy_model column contains coefficient information, while glance_model contains model-level statistics like R-squared values.
Step 4: Unnest results for comparison
We’ll extract the coefficient information for easy comparison across species.
coefficients <- model_summaries |>
select(species, tidy_model) |>
unnest(tidy_model)
coefficientsNow we can easily compare slopes, intercepts, and significance levels across all three penguin species.
Example 2: Practical Application
The Problem
We’re analyzing car performance data and want to understand how weight affects fuel efficiency (mpg) for cars with different numbers of cylinders. We also want to generate predictions and assess model quality for each cylinder group.
Step 1: Prepare and nest the data
We’ll group cars by cylinder count and create nested datasets.
car_nested <- mtcars |>
select(mpg, wt, cyl) |>
group_by(cyl) |>
nest() |>
filter(map_lgl(data, ~ nrow(.x) >= 3))
car_nestedWe filtered to ensure each group has at least 3 observations for meaningful regression analysis.
Step 2: Fit models and extract diagnostics
We’ll fit weight-to-mpg models and extract comprehensive model information.
car_analysis <- car_nested |>
mutate(
model = map(data, ~ lm(mpg ~ wt, data = .x)),
tidy = map(model, tidy),
glance = map(model, glance)
)Each cylinder group now has its own model with diagnostic information readily available.
Step 3: Compare model performance
We’ll examine R-squared values and other fit statistics across cylinder groups.
model_performance <- car_analysis |>
select(cyl, glance) |>
unnest(glance) |>
select(cyl, r.squared, adj.r.squared, p.value)
model_performanceThis shows us which cylinder groups have the strongest weight-mpg relationships and which models explain the most variance.
Step 4: Generate predictions
We’ll create predictions for new weight values for each cylinder group.
new_weights <- tibble(wt = seq(2, 5, by = 0.5))
predictions <- car_analysis |>
mutate(
pred = map(model, ~ predict(.x, newdata = new_weights))
) |>
select(cyl, pred) |>
unnest(pred) |>
bind_cols(rep(list(new_weights), nrow(car_nested))) |>
unnest(wt)These predictions allow us to visualize and compare the weight-mpg relationship across different cylinder groups.
Summary
- Use
nest()aftergroup_by()to create separate datasets for each group you want to model - Apply
map()with modeling functions to fit the same model structure to each nested dataset - Leverage broom functions (
tidy(),glance(),augment()) to extract clean model outputs - Use
unnest()to expand results back into regular tibble format for easy comparison and visualization This workflow scales efficiently from a few groups to hundreds of models with minimal code changes