How to Extract p-values from multiple simple linear regression models

lapply() function

linear regression

Learn how to extract p-values from multiple simple linear regression models with this comprehensive R tutorial. Includes practical examples and code snippets.

Published

October 12, 2022

Introduction

When analyzing multiple variables simultaneously, you often need to run several simple linear regression models and extract their p-values for comparison. This approach helps identify which predictors have statistically significant relationships with your response variable across different models.

Getting Started

library(tidyverse)
library(palmerpenguins)

Example 1: Basic Usage

The Problem

We want to test which penguin measurements significantly predict body mass by running separate simple linear regressions for each predictor variable.

Step 1: Prepare the data

First, we’ll select our variables of interest from the penguins dataset.

penguin_data <- penguins |>
  select(body_mass_g, bill_length_mm, bill_depth_mm, 
         flipper_length_mm) |>
  na.omit()

This creates a clean dataset with body mass as our response variable and three potential predictors.

Step 2: Create multiple regression models

We’ll fit separate linear models for each predictor variable.

models <- list(
  bill_length = lm(body_mass_g ~ bill_length_mm, data = penguin_data),
  bill_depth = lm(body_mass_g ~ bill_depth_mm, data = penguin_data),
  flipper_length = lm(body_mass_g ~ flipper_length_mm, data = penguin_data)
)

This creates a named list containing three simple linear regression models.

Step 3: Extract p-values using map

Now we’ll extract the p-values from each model’s summary statistics.

p_values <- models |>
  map(summary) |>
  map_dbl(~ .x$coefficients[2, 4])

print(p_values)

The map functions extract the p-value for each predictor (row 2, column 4 of the coefficients matrix).

Example 2: Practical Application

The Problem

A researcher wants to quickly screen multiple car characteristics to identify which ones significantly predict fuel efficiency (mpg). They need an organized approach to compare p-values across different simple regression models.

Step 1: Set up the screening analysis

We’ll select relevant numeric variables from mtcars for our analysis.

car_data <- mtcars |>
  select(mpg, wt, hp, disp, qsec, drat)

predictors <- c("wt", "hp", "disp", "qsec", "drat")

This gives us mpg as our response variable and five potential predictors to test.

Step 2: Create models programmatically

We’ll use map to create multiple models efficiently with a formula-based approach.

regression_models <- predictors |>
  set_names() |>
  map(~ lm(as.formula(paste("mpg ~", .x)), data = car_data))

This approach dynamically creates formulas and fits models for each predictor variable.

Step 3: Extract comprehensive results

Let’s extract both p-values and other key statistics for comparison.

results <- regression_models |>
  map_dfr(~ {
    summary_stats <- summary(.x)
    tibble(
      p_value = summary_stats$coefficients[2, 4],
      r_squared = summary_stats$r.squared
    )
  }, .id = "predictor")

This creates a tidy data frame with predictor names, p-values, and R-squared values.

Step 4: Identify significant predictors

Finally, we’ll filter and arrange results to highlight significant relationships.

significant_predictors <- results |>
  mutate(significant = p_value < 0.05) |>
  arrange(p_value)

print(significant_predictors)

This shows which predictors have statistically significant relationships with mpg, ordered by significance.

Summary

Use map() with model lists to efficiently extract p-values from multiple regression models
The p-value is located at position [2, 4] in the coefficients matrix from summary()
Combine map_dfr() with .id parameter to create tidy results tables with predictor names
Consider extracting additional statistics (R-squared, coefficients) alongside p-values for comprehensive model comparison
Always arrange results by p-value to quickly identify the most significant relationships

--- title: "How to Extract p-values from multiple simple linear regression models" description: "Learn how to extract p-values from multiple simple linear regression models with this comprehensive R tutorial. Includes practical examples and code snippets." date: 2022-10-12 categories: ['lapply() function', 'linear regression'] format: html: code-fold: false code-tools: true --- ## Introduction When analyzing multiple variables simultaneously, you often need to run several simple linear regression models and extract their p-values for comparison. This approach helps identify which predictors have statistically significant relationships with your response variable across different models. ## Getting Started ```r library(tidyverse) library(palmerpenguins) ``` ## Example 1: Basic Usage ### The Problem We want to test which penguin measurements significantly predict body mass by running separate simple linear regressions for each predictor variable. ### Step 1: Prepare the data First, we'll select our variables of interest from the penguins dataset. ```r penguin_data <- penguins |> select(body_mass_g, bill_length_mm, bill_depth_mm, flipper_length_mm) |> na.omit() ``` This creates a clean dataset with body mass as our response variable and three potential predictors. ### Step 2: Create multiple regression models We'll fit separate linear models for each predictor variable. ```r models <- list( bill_length = lm(body_mass_g ~ bill_length_mm, data = penguin_data), bill_depth = lm(body_mass_g ~ bill_depth_mm, data = penguin_data), flipper_length = lm(body_mass_g ~ flipper_length_mm, data = penguin_data) ) ``` This creates a named list containing three simple linear regression models. ### Step 3: Extract p-values using map Now we'll extract the p-values from each model's summary statistics. ```r p_values <- models |> map(summary) |> map_dbl(~ .x$coefficients[2, 4]) print(p_values) ``` The map functions extract the p-value for each predictor (row 2, column 4 of the coefficients matrix). ## Example 2: Practical Application ### The Problem A researcher wants to quickly screen multiple car characteristics to identify which ones significantly predict fuel efficiency (mpg). They need an organized approach to compare p-values across different simple regression models. ### Step 1: Set up the screening analysis We'll select relevant numeric variables from mtcars for our analysis. ```r car_data <- mtcars |> select(mpg, wt, hp, disp, qsec, drat) predictors <- c("wt", "hp", "disp", "qsec", "drat") ``` This gives us mpg as our response variable and five potential predictors to test. ### Step 2: Create models programmatically We'll use map to create multiple models efficiently with a formula-based approach. ```r regression_models <- predictors |> set_names() |> map(~ lm(as.formula(paste("mpg ~", .x)), data = car_data)) ``` This approach dynamically creates formulas and fits models for each predictor variable. ### Step 3: Extract comprehensive results Let's extract both p-values and other key statistics for comparison. ```r results <- regression_models |> map_dfr(~ { summary_stats <- summary(.x) tibble( p_value = summary_stats$coefficients[2, 4], r_squared = summary_stats$r.squared ) }, .id = "predictor") ``` This creates a tidy data frame with predictor names, p-values, and R-squared values. ### Step 4: Identify significant predictors Finally, we'll filter and arrange results to highlight significant relationships. ```r significant_predictors <- results |> mutate(significant = p_value < 0.05) |> arrange(p_value) print(significant_predictors) ``` This shows which predictors have statistically significant relationships with mpg, ordered by significance. ## Summary - Use `map()` with model lists to efficiently extract p-values from multiple regression models - The p-value is located at position [2, 4] in the coefficients matrix from `summary()` - Combine `map_dfr()` with `.id` parameter to create tidy results tables with predictor names - Consider extracting additional statistics (R-squared, coefficients) alongside p-values for comprehensive model comparison - Always arrange results by p-value to quickly identify the most significant relationships --- ## Related Posts - [How to extract residuals from a linear regression model](/statistics/extract-residuals-from-a-linear-regression-model.html) - [How to get p-value from linear regression model](/statistics/get-p-value-from-linear-regression-model.html) - [How to multiple linear regression in R](/statistics/how-to-multiple-linear-regression-in-r.html) - [dplyr count(): count unique values of a variable](/dplyr/dplyr-count-count-unique-values-of-a-variable.html) - [How to apply a function on multiple columns using across()](/dplyr/apply-a-function-on-multiple-columns-using-across.html)