How to Extract p-values from multiple simple linear regression models

lapply() function
linear regression
Learn how to extract p-values from multiple simple linear regression models with this comprehensive R tutorial. Includes practical examples and code snippets.
Published

October 12, 2022

Introduction

When analyzing multiple variables simultaneously, you often need to run several simple linear regression models and extract their p-values for comparison. This approach helps identify which predictors have statistically significant relationships with your response variable across different models.

Getting Started

library(tidyverse)
library(palmerpenguins)

Example 1: Basic Usage

The Problem

We want to test which penguin measurements significantly predict body mass by running separate simple linear regressions for each predictor variable.

Step 1: Prepare the data

First, we’ll select our variables of interest from the penguins dataset.

penguin_data <- penguins |>
  select(body_mass_g, bill_length_mm, bill_depth_mm, 
         flipper_length_mm) |>
  na.omit()

This creates a clean dataset with body mass as our response variable and three potential predictors.

Step 2: Create multiple regression models

We’ll fit separate linear models for each predictor variable.

models <- list(
  bill_length = lm(body_mass_g ~ bill_length_mm, data = penguin_data),
  bill_depth = lm(body_mass_g ~ bill_depth_mm, data = penguin_data),
  flipper_length = lm(body_mass_g ~ flipper_length_mm, data = penguin_data)
)

This creates a named list containing three simple linear regression models.

Step 3: Extract p-values using map

Now we’ll extract the p-values from each model’s summary statistics.

p_values <- models |>
  map(summary) |>
  map_dbl(~ .x$coefficients[2, 4])

print(p_values)

The map functions extract the p-value for each predictor (row 2, column 4 of the coefficients matrix).

Example 2: Practical Application

The Problem

A researcher wants to quickly screen multiple car characteristics to identify which ones significantly predict fuel efficiency (mpg). They need an organized approach to compare p-values across different simple regression models.

Step 1: Set up the screening analysis

We’ll select relevant numeric variables from mtcars for our analysis.

car_data <- mtcars |>
  select(mpg, wt, hp, disp, qsec, drat)

predictors <- c("wt", "hp", "disp", "qsec", "drat")

This gives us mpg as our response variable and five potential predictors to test.

Step 2: Create models programmatically

We’ll use map to create multiple models efficiently with a formula-based approach.

regression_models <- predictors |>
  set_names() |>
  map(~ lm(as.formula(paste("mpg ~", .x)), data = car_data))

This approach dynamically creates formulas and fits models for each predictor variable.

Step 3: Extract comprehensive results

Let’s extract both p-values and other key statistics for comparison.

results <- regression_models |>
  map_dfr(~ {
    summary_stats <- summary(.x)
    tibble(
      p_value = summary_stats$coefficients[2, 4],
      r_squared = summary_stats$r.squared
    )
  }, .id = "predictor")

This creates a tidy data frame with predictor names, p-values, and R-squared values.

Step 4: Identify significant predictors

Finally, we’ll filter and arrange results to highlight significant relationships.

significant_predictors <- results |>
  mutate(significant = p_value < 0.05) |>
  arrange(p_value)

print(significant_predictors)

This shows which predictors have statistically significant relationships with mpg, ordered by significance.

Summary

  • Use map() with model lists to efficiently extract p-values from multiple regression models
  • The p-value is located at position [2, 4] in the coefficients matrix from summary()
  • Combine map_dfr() with .id parameter to create tidy results tables with predictor names
  • Consider extracting additional statistics (R-squared, coefficients) alongside p-values for comprehensive model comparison
  • Always arrange results by p-value to quickly identify the most significant relationships