How to get p-value from linear regression model

linear regression

rstats

Published

September 2, 2022

In this tutorial, we will learn how to extract p-value from a linear regression fit object in R. We will use two approaches to pull out p-value and other statistics of interest from a linear regression model. First we will extract the p-value directly from summary of the linear regression fit object using coefficients argument. And then we will use broom R package that can clean up the results from statistical model fit objects like simple linear regression.

Let us get started by loading the packages needed.

library(broom)
library(tidyverse)

In this example, we will perform simple linear regression analysis using built-in iris dataset available as a dataframe.

iris %>% head()
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa

To perform simple linear regression with data available in a dataframe, we can specify data argument in lm() function. The advantage of using data argument is that we can specify the names of the columns in the dataframe as variables without using any quotes to fit linear regression model.

In the example below, we fit simple linear regression model between two numerical variables in the iris dataset.

# fit linear regression using lm() function with data argument
lm_fit |t|)    
## (Intercept)   4.30660    0.07839   54.94    
##   .. ..- attr(*, "predvars")= language list(Sepal.Length, Petal.Length)
##   .. ..- attr(*, "dataClasses")= Named chr [1:2] "numeric" "numeric"
##   .. .. ..- attr(*, "names")= chr [1:2] "Sepal.Length" "Petal.Length"
##  $ residuals    : Named num [1:150] 0.2209 0.0209 -0.1382 -0.32 0.1209 ...
##   ..- attr(*, "names")= chr [1:150] "1" "2" "3" "4" ...
##  $ coefficients : num [1:2, 1:4] 4.3066 0.4089 0.0784 0.0189 54.9389 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:2] "(Intercept)" "Petal.Length"
##   .. ..$ : chr [1:4] "Estimate" "Std. Error" "t value" "Pr(>|t|)"
##  $ aliased      : Named logi [1:2] FALSE FALSE
##   ..- attr(*, "names")= chr [1:2] "(Intercept)" "Petal.Length"
##  $ sigma        : num 0.407
##  $ df           : int [1:3] 2 148 2
##  $ r.squared    : num 0.76
##  $ adj.r.squared: num 0.758
##  $ fstatistic   : Named num [1:3] 469 1 148
##   ..- attr(*, "names")= chr [1:3] "value" "numdf" "dendf"
##  $ cov.unscaled : num [1:2, 1:2] 0.03708 -0.00809 -0.00809 0.00215
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:2] "(Intercept)" "Petal.Length"
##   .. ..$ : chr [1:2] "(Intercept)" "Petal.Length"
##  - attr(*, "class")= chr "summary.lm"

coefficients argument gives us the effect size (the estimate), statistic, and p.value for each term as a dataframe.

summary(lm_fit)$coefficients
##               Estimate Std. Error  t value      Pr(>|t|)
## (Intercept)  4.3066034 0.07838896 54.93890 2.426713e-100
## Petal.Length 0.4089223 0.01889134 21.64602  1.038667e-47

Now we can pull pull p-value from the dataframe

First we pull all the p.values available in the column Pr(>|t|).

# pull p.values from linear regression model
summary(lm_fit)$coefficients[ ,4]

Here we specifically pull p.value for the association between Sepal.Length and Petal.Length.

# pull p.value from linear regression model
summary(lm_fit)$coefficients[2,4]

broom package. tidy the results. from linear regression model

Another option to get the p.value is to use broom R package.

The broom package takes the messy output of built-in functions in R, such as lm, nls, or t.test, and turns them into tidy tibbles.

tidy() function in broom package gives the result of coefficients in a tidy tibble format.

broom::tidy(lm_fit)

## # A tibble: 2 × 5
##   term         estimate std.error statistic   p.value
##                             
## 1 (Intercept)     4.31     0.0784      54.9 2.43e-100
## 2 Petal.Length    0.409    0.0189      21.6 1.04e- 47

Note the column names of the resulting tibble is much cleaner making it easier to get what we need.

To get all the p.values from the linear model we can use pull() function on the p.value column.

broom::tidy(lm_fit) %>%
  pull(p.value)

And to get the p.value. for the specific term, we can using pull() function in combination with filter() function as shown below.

broom::tidy(lm_fit) %>%
  filter(term=="Petal.Length") %>%
  pull(p.value)

--- title: "How to get p-value from linear regression model" date: 2022-09-02 categories: ['linear regression', 'rstats'] format: html: code-fold: false code-tools: true --- In this tutorial, we will learn how to extract p-value from a linear regression fit object in R. We will use two approaches to pull out p-value and other statistics of interest from a linear regression model. First we will extract the p-value directly from summary of the linear regression fit object using coefficients argument. And then we will use broom R package that can clean up the results from statistical model fit objects like simple linear regression. Let us get started by loading the packages needed. ```r library(broom) library(tidyverse) ``` In this example, we will perform simple linear regression analysis using built-in iris dataset available as a dataframe. ```r iris %>% head() ## Sepal.Length Sepal.Width Petal.Length Petal.Width Species ## 1 5.1 3.5 1.4 0.2 setosa ## 2 4.9 3.0 1.4 0.2 setosa ## 3 4.7 3.2 1.3 0.2 setosa ## 4 4.6 3.1 1.5 0.2 setosa ## 5 5.0 3.6 1.4 0.2 setosa ## 6 5.4 3.9 1.7 0.4 setosa ``` To perform simple linear regression with data available in a dataframe, we can specify data argument in lm() function. The advantage of using data argument is that we can specify the names of the columns in the dataframe as variables without using any quotes to fit linear regression model. In the example below, we fit simple linear regression model between two numerical variables in the iris dataset. ```r # fit linear regression using lm() function with data argument lm_fit |t|) ## (Intercept) 4.30660 0.07839 54.94 ## .. ..- attr(*, "predvars")= language list(Sepal.Length, Petal.Length) ## .. ..- attr(*, "dataClasses")= Named chr [1:2] "numeric" "numeric" ## .. .. ..- attr(*, "names")= chr [1:2] "Sepal.Length" "Petal.Length" ## $ residuals : Named num [1:150] 0.2209 0.0209 -0.1382 -0.32 0.1209 ... ## ..- attr(*, "names")= chr [1:150] "1" "2" "3" "4" ... ## $ coefficients : num [1:2, 1:4] 4.3066 0.4089 0.0784 0.0189 54.9389 ... ## ..- attr(*, "dimnames")=List of 2 ## .. ..$ : chr [1:2] "(Intercept)" "Petal.Length" ## .. ..$ : chr [1:4] "Estimate" "Std. Error" "t value" "Pr(>|t|)" ## $ aliased : Named logi [1:2] FALSE FALSE ## ..- attr(*, "names")= chr [1:2] "(Intercept)" "Petal.Length" ## $ sigma : num 0.407 ## $ df : int [1:3] 2 148 2 ## $ r.squared : num 0.76 ## $ adj.r.squared: num 0.758 ## $ fstatistic : Named num [1:3] 469 1 148 ## ..- attr(*, "names")= chr [1:3] "value" "numdf" "dendf" ## $ cov.unscaled : num [1:2, 1:2] 0.03708 -0.00809 -0.00809 0.00215 ## ..- attr(*, "dimnames")=List of 2 ## .. ..$ : chr [1:2] "(Intercept)" "Petal.Length" ## .. ..$ : chr [1:2] "(Intercept)" "Petal.Length" ## - attr(*, "class")= chr "summary.lm" ``` coefficients argument gives us the effect size (the estimate), statistic, and p.value for each term as a dataframe. ```r summary(lm_fit)$coefficients ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 4.3066034 0.07838896 54.93890 2.426713e-100 ## Petal.Length 0.4089223 0.01889134 21.64602 1.038667e-47 ``` Now we can pull pull p-value from the dataframe First we pull all the p.values available in the column Pr(>|t|). ```r # pull p.values from linear regression model summary(lm_fit)$coefficients[ ,4] ``` Here we specifically pull p.value for the association between Sepal.Length and Petal.Length. ```r # pull p.value from linear regression model summary(lm_fit)$coefficients[2,4] ``` ### broom package. tidy the results. from linear regression model Another option to get the p.value is to use broom R package. The broom package takes the messy output of built-in functions in R, such as lm, nls, or t.test, and turns them into tidy tibbles. tidy() function in broom package gives the result of coefficients in a tidy tibble format. ```r broom::tidy(lm_fit) ## # A tibble: 2 × 5 ## term estimate std.error statistic p.value ## ## 1 (Intercept) 4.31 0.0784 54.9 2.43e-100 ## 2 Petal.Length 0.409 0.0189 21.6 1.04e- 47 ``` Note the column names of the resulting tibble is much cleaner making it easier to get what we need. To get all the p.values from the linear model we can use pull() function on the p.value column. ```r broom::tidy(lm_fit) %>% pull(p.value) ``` And to get the p.value. for the specific term, we can using pull() function in combination with filter() function as shown below. ```r broom::tidy(lm_fit) %>% filter(term=="Petal.Length") %>% pull(p.value) ```