How to Extract p-values from multiple simple linear regression models
Introduction
When analyzing multiple variables simultaneously, you often need to run several simple linear regression models and extract their p-values for comparison. This approach helps identify which predictors have statistically significant relationships with your response variable across different models.
Getting Started
library(tidyverse)
library(palmerpenguins)Example 1: Basic Usage
The Problem
We want to test which penguin measurements significantly predict body mass by running separate simple linear regressions for each predictor variable.
Step 1: Prepare the data
First, we’ll select our variables of interest from the penguins dataset.
penguin_data <- penguins |>
select(body_mass_g, bill_length_mm, bill_depth_mm,
flipper_length_mm) |>
na.omit()This creates a clean dataset with body mass as our response variable and three potential predictors.
Step 2: Create multiple regression models
We’ll fit separate linear models for each predictor variable.
models <- list(
bill_length = lm(body_mass_g ~ bill_length_mm, data = penguin_data),
bill_depth = lm(body_mass_g ~ bill_depth_mm, data = penguin_data),
flipper_length = lm(body_mass_g ~ flipper_length_mm, data = penguin_data)
)This creates a named list containing three simple linear regression models.
Step 3: Extract p-values using map
Now we’ll extract the p-values from each model’s summary statistics.
p_values <- models |>
map(summary) |>
map_dbl(~ .x$coefficients[2, 4])
print(p_values)The map functions extract the p-value for each predictor (row 2, column 4 of the coefficients matrix).
Example 2: Practical Application
The Problem
A researcher wants to quickly screen multiple car characteristics to identify which ones significantly predict fuel efficiency (mpg). They need an organized approach to compare p-values across different simple regression models.
Step 1: Set up the screening analysis
We’ll select relevant numeric variables from mtcars for our analysis.
car_data <- mtcars |>
select(mpg, wt, hp, disp, qsec, drat)
predictors <- c("wt", "hp", "disp", "qsec", "drat")This gives us mpg as our response variable and five potential predictors to test.
Step 2: Create models programmatically
We’ll use map to create multiple models efficiently with a formula-based approach.
regression_models <- predictors |>
set_names() |>
map(~ lm(as.formula(paste("mpg ~", .x)), data = car_data))This approach dynamically creates formulas and fits models for each predictor variable.
Step 3: Extract comprehensive results
Let’s extract both p-values and other key statistics for comparison.
results <- regression_models |>
map_dfr(~ {
summary_stats <- summary(.x)
tibble(
p_value = summary_stats$coefficients[2, 4],
r_squared = summary_stats$r.squared
)
}, .id = "predictor")This creates a tidy data frame with predictor names, p-values, and R-squared values.
Step 4: Identify significant predictors
Finally, we’ll filter and arrange results to highlight significant relationships.
significant_predictors <- results |>
mutate(significant = p_value < 0.05) |>
arrange(p_value)
print(significant_predictors)This shows which predictors have statistically significant relationships with mpg, ordered by significance.
Summary
- Use
map()with model lists to efficiently extract p-values from multiple regression models - The p-value is located at position [2, 4] in the coefficients matrix from
summary() - Combine
map_dfr()with.idparameter to create tidy results tables with predictor names - Consider extracting additional statistics (R-squared, coefficients) alongside p-values for comprehensive model comparison
Always arrange results by p-value to quickly identify the most significant relationships