How to get p-value from linear regression model

linear regression
rstats
Learn how to get p-value from linear regression model with this comprehensive R tutorial. Includes practical examples and code snippets.
Published

September 2, 2022

Introduction

P-values in linear regression help determine if predictor variables have a statistically significant relationship with the response variable. They indicate the probability of observing your results (or more extreme) if there truly is no relationship between variables. This tutorial shows how to extract and interpret p-values from linear regression models in R.

Getting Started

library(tidyverse)
library(palmerpenguins)

Example 1: Basic Usage

The Problem

We want to test if there’s a significant relationship between penguin flipper length and body mass. We need to extract the p-value to determine statistical significance.

Step 1: Create the Linear Model

First, we’ll fit a simple linear regression model.

# Create linear model
penguin_model <- lm(body_mass_g ~ flipper_length_mm, 
                   data = penguins)

# View the model
penguin_model

This creates a linear model object that contains all regression information, including coefficients and statistical tests.

Step 2: Extract Model Summary

The summary contains detailed statistical information including p-values.

# Get detailed model summary
model_summary <- summary(penguin_model)

# Display the summary
model_summary

The summary shows coefficients, standard errors, t-values, and p-values for each predictor variable.

Step 3: Extract Specific P-values

We can extract p-values directly from the summary object.

# Get p-values from coefficients table
p_values <- model_summary$coefficients[, "Pr(>|t|)"]

# View p-values
p_values

This gives us p-values for both the intercept and slope coefficient, showing statistical significance levels.

Example 2: Practical Application

The Problem

A researcher wants to analyze multiple factors affecting penguin body mass, including flipper length and bill length. They need to identify which predictors are statistically significant and extract specific p-values for reporting.

Step 1: Create Multiple Regression Model

We’ll build a model with multiple predictors to examine their individual significance.

# Create multiple regression model
multi_model <- lm(body_mass_g ~ flipper_length_mm + bill_length_mm, 
                  data = penguins)

# View model structure
print(multi_model)

This model tests whether both flipper length and bill length significantly predict body mass.

Step 2: Extract P-values Using Tidy Format

The broom package provides a clean way to extract model statistics.

# Install and load broom if needed
library(broom)

# Get tidy model results
tidy_results <- tidy(multi_model)
print(tidy_results)

This creates a clean tibble with term names, estimates, standard errors, statistics, and p-values.

Step 3: Filter and Extract Specific P-values

We can filter for specific predictors and extract their p-values.

# Extract p-value for flipper length
flipper_p <- tidy_results |> 
  filter(term == "flipper_length_mm") |> 
  pull(p.value)

print(paste("Flipper length p-value:", round(flipper_p, 6)))

This gives us the exact p-value for the flipper length predictor for statistical reporting.

Step 4: Check Statistical Significance

We can programmatically check which predictors are statistically significant.

# Identify significant predictors (p < 0.05)
significant_vars <- tidy_results |> 
  filter(p.value < 0.05, term != "(Intercept)") |> 
  select(term, p.value)

print(significant_vars)

This filters results to show only statistically significant predictors, excluding the intercept term.

Step 5: Create Results Summary

Finally, we’ll create a formatted summary of our findings.

# Create formatted results summary
results_summary <- tidy_results |> 
  mutate(
    significance = case_when(
      p.value < 0.001 ~ "***",
      p.value < 0.01 ~ "**", 
      p.value < 0.05 ~ "*",
      TRUE ~ ""
    )
  ) |> 
  select(term, estimate, p.value, significance)

print(results_summary)

This adds significance stars and creates a clean summary table for easy interpretation and reporting.

Summary

  • Use summary() on lm objects to get comprehensive regression statistics including p-values
  • Extract p-values directly using model_summary$coefficients[, "Pr(>|t|)"] for programmatic access
  • The broom::tidy() function provides clean, tibble-format results that work well with tidyverse workflows
  • P-values less than 0.05 typically indicate statistical significance at the 95% confidence level
  • Multiple regression models show individual p-values for each predictor, helping identify which variables significantly contribute to the model