How to get p-value from linear regression model
Introduction
P-values in linear regression help determine if predictor variables have a statistically significant relationship with the response variable. They indicate the probability of observing your results (or more extreme) if there truly is no relationship between variables. This tutorial shows how to extract and interpret p-values from linear regression models in R.
Getting Started
library(tidyverse)
library(palmerpenguins)Example 1: Basic Usage
The Problem
We want to test if there’s a significant relationship between penguin flipper length and body mass. We need to extract the p-value to determine statistical significance.
Step 1: Create the Linear Model
First, we’ll fit a simple linear regression model.
# Create linear model
penguin_model <- lm(body_mass_g ~ flipper_length_mm,
data = penguins)
# View the model
penguin_modelThis creates a linear model object that contains all regression information, including coefficients and statistical tests.
Step 2: Extract Model Summary
The summary contains detailed statistical information including p-values.
# Get detailed model summary
model_summary <- summary(penguin_model)
# Display the summary
model_summaryThe summary shows coefficients, standard errors, t-values, and p-values for each predictor variable.
Step 3: Extract Specific P-values
We can extract p-values directly from the summary object.
# Get p-values from coefficients table
p_values <- model_summary$coefficients[, "Pr(>|t|)"]
# View p-values
p_valuesThis gives us p-values for both the intercept and slope coefficient, showing statistical significance levels.
Example 2: Practical Application
The Problem
A researcher wants to analyze multiple factors affecting penguin body mass, including flipper length and bill length. They need to identify which predictors are statistically significant and extract specific p-values for reporting.
Step 1: Create Multiple Regression Model
We’ll build a model with multiple predictors to examine their individual significance.
# Create multiple regression model
multi_model <- lm(body_mass_g ~ flipper_length_mm + bill_length_mm,
data = penguins)
# View model structure
print(multi_model)This model tests whether both flipper length and bill length significantly predict body mass.
Step 2: Extract P-values Using Tidy Format
The broom package provides a clean way to extract model statistics.
# Install and load broom if needed
library(broom)
# Get tidy model results
tidy_results <- tidy(multi_model)
print(tidy_results)This creates a clean tibble with term names, estimates, standard errors, statistics, and p-values.
Step 3: Filter and Extract Specific P-values
We can filter for specific predictors and extract their p-values.
# Extract p-value for flipper length
flipper_p <- tidy_results |>
filter(term == "flipper_length_mm") |>
pull(p.value)
print(paste("Flipper length p-value:", round(flipper_p, 6)))This gives us the exact p-value for the flipper length predictor for statistical reporting.
Step 4: Check Statistical Significance
We can programmatically check which predictors are statistically significant.
# Identify significant predictors (p < 0.05)
significant_vars <- tidy_results |>
filter(p.value < 0.05, term != "(Intercept)") |>
select(term, p.value)
print(significant_vars)This filters results to show only statistically significant predictors, excluding the intercept term.
Step 5: Create Results Summary
Finally, we’ll create a formatted summary of our findings.
# Create formatted results summary
results_summary <- tidy_results |>
mutate(
significance = case_when(
p.value < 0.001 ~ "***",
p.value < 0.01 ~ "**",
p.value < 0.05 ~ "*",
TRUE ~ ""
)
) |>
select(term, estimate, p.value, significance)
print(results_summary)This adds significance stars and creates a clean summary table for easy interpretation and reporting.
Summary
- Use
summary()on lm objects to get comprehensive regression statistics including p-values - Extract p-values directly using
model_summary$coefficients[, "Pr(>|t|)"]for programmatic access - The
broom::tidy()function provides clean, tibble-format results that work well with tidyverse workflows - P-values less than 0.05 typically indicate statistical significance at the 95% confidence level
Multiple regression models show individual p-values for each predictor, helping identify which variables significantly contribute to the model