How to get p-value from linear regression model
In this tutorial, we will learn how to extract p-value from a linear regression fit object in R. We will use two approaches to pull out p-value and other statistics of interest from a linear regression model. First we will extract the p-value directly from summary of the linear regression fit object using coefficients argument. And then we will use broom R package that can clean up the results from statistical model fit objects like simple linear regression.
Let us get started by loading the packages needed.
library(broom)
library(tidyverse)In this example, we will perform simple linear regression analysis using built-in iris dataset available as a dataframe.
iris %>% head()
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosaTo perform simple linear regression with data available in a dataframe, we can specify data argument in lm() function. The advantage of using data argument is that we can specify the names of the columns in the dataframe as variables without using any quotes to fit linear regression model.
In the example below, we fit simple linear regression model between two numerical variables in the iris dataset.
# fit linear regression using lm() function with data argument
lm_fit |t|)
## (Intercept) 4.30660 0.07839 54.94
## .. ..- attr(*, "predvars")= language list(Sepal.Length, Petal.Length)
## .. ..- attr(*, "dataClasses")= Named chr [1:2] "numeric" "numeric"
## .. .. ..- attr(*, "names")= chr [1:2] "Sepal.Length" "Petal.Length"
## $ residuals : Named num [1:150] 0.2209 0.0209 -0.1382 -0.32 0.1209 ...
## ..- attr(*, "names")= chr [1:150] "1" "2" "3" "4" ...
## $ coefficients : num [1:2, 1:4] 4.3066 0.4089 0.0784 0.0189 54.9389 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:2] "(Intercept)" "Petal.Length"
## .. ..$ : chr [1:4] "Estimate" "Std. Error" "t value" "Pr(>|t|)"
## $ aliased : Named logi [1:2] FALSE FALSE
## ..- attr(*, "names")= chr [1:2] "(Intercept)" "Petal.Length"
## $ sigma : num 0.407
## $ df : int [1:3] 2 148 2
## $ r.squared : num 0.76
## $ adj.r.squared: num 0.758
## $ fstatistic : Named num [1:3] 469 1 148
## ..- attr(*, "names")= chr [1:3] "value" "numdf" "dendf"
## $ cov.unscaled : num [1:2, 1:2] 0.03708 -0.00809 -0.00809 0.00215
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:2] "(Intercept)" "Petal.Length"
## .. ..$ : chr [1:2] "(Intercept)" "Petal.Length"
## - attr(*, "class")= chr "summary.lm"coefficients argument gives us the effect size (the estimate), statistic, and p.value for each term as a dataframe.
summary(lm_fit)$coefficients
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.3066034 0.07838896 54.93890 2.426713e-100
## Petal.Length 0.4089223 0.01889134 21.64602 1.038667e-47Now we can pull pull p-value from the dataframe
First we pull all the p.values available in the column Pr(>|t|).
# pull p.values from linear regression model
summary(lm_fit)$coefficients[ ,4]Here we specifically pull p.value for the association between Sepal.Length and Petal.Length.
# pull p.value from linear regression model
summary(lm_fit)$coefficients[2,4]broom package. tidy the results. from linear regression model
Another option to get the p.value is to use broom R package.
The broom package takes the messy output of built-in functions in R, such as lm, nls, or t.test, and turns them into tidy tibbles.
tidy() function in broom package gives the result of coefficients in a tidy tibble format.
broom::tidy(lm_fit)
## # A tibble: 2 × 5
## term estimate std.error statistic p.value
##
## 1 (Intercept) 4.31 0.0784 54.9 2.43e-100
## 2 Petal.Length 0.409 0.0189 21.6 1.04e- 47Note the column names of the resulting tibble is much cleaner making it easier to get what we need.
To get all the p.values from the linear model we can use pull() function on the p.value column.
broom::tidy(lm_fit) %>%
pull(p.value)And to get the p.value. for the specific term, we can using pull() function in combination with filter() function as shown below.
broom::tidy(lm_fit) %>%
filter(term=="Petal.Length") %>%
pull(p.value)