How to Extract p-values from multiple simple linear regression models

lapply() function
linear regression
Published

October 12, 2022

Sometimes you might fit many simple linear regression models and would like to extract p-values from each model. In this tutorial, we will learn two approaches to extract p-values from multiple simple linear regression models built in R. We will first use for loop to build and extract pvalue from multiple linear models and then we will learn how to use lapply() function in base R to apply lm() and extract p-values from multiple simple linear regression models.

Simulate data for fitting many linear models

To fit many linear models we need data as matrix or a dataframe. Let us simulate some data using random numbers as a matrix.

set.seed(1)
# Simulate 10 features from 50 individuals
mat % head()

# A tibble: 6 × 1
  condition
      
1 A        
2 A        
3 A        
4 A        
5 A        
6 A

Let us quickly look at the result of building linear model on just one of the rows.

summary(lm(mat[2, ] ~ ., data=meta))

Call:
lm(formula = mat[2, ] ~ ., data = meta)

Residuals:
     Min       1Q   Median       3Q      Max 
-3.03746 -0.66902  0.00681  0.76742  1.98526 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)   0.1485     0.2073   0.717    0.477
conditionB    0.1509     0.2931   0.515    0.609

Residual standard error: 1.036 on 48 degrees of freedom
Multiple R-squared:  0.005488,  Adjusted R-squared:  -0.01523 
F-statistic: 0.2649 on 1 and 48 DF,  p-value: 0.6092

Extracting p-values from multiple simple linear models using for-loop

Let us use for loop to loop through each row of the data matrix and build simple linear regression model. We would extract p-value using summary() function on the linear fit using coefficients method as before

pvals |t|)
(Intercept)  0.06218    0.21619   0.288    0.775
conditionB   0.05683    0.30573   0.186    0.853

Residual standard error: 1.081 on 48 degrees of freedom
Multiple R-squared:  0.0007194, Adjusted R-squared:  -0.0201 
F-statistic: 0.03455 on 1 and 48 DF,  p-value: 0.8533

Response Y2 :

Call:
lm(formula = Y2 ~ condition, data = meta)

Residuals:
     Min       1Q   Median       3Q      Max 
-3.03746 -0.66902  0.00681  0.76742  1.98526 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)   0.1485     0.2073   0.717    0.477
conditionB    0.1509     0.2931   0.515    0.609

Residual standard error: 1.036 on 48 degrees of freedom
Multiple R-squared:  0.005488,  Adjusted R-squared:  -0.01523 
F-statistic: 0.2649 on 1 and 48 DF,  p-value: 0.6092

....
.....

We can then use base R’s lapply() function on each of the linear fit and extract the p-value. lapply() function applies a function over all the elements of a list or vector. In our example we have list of summary objects. And we can write an anonymous function that extracts the pval as shown below. lapply returns

a list of the same length as the input list, where each element is the result of applying the function to the corresponding element.

lapply(summary(lm(t(mat) ~ ., data = meta)),
       function(x){x$coefficients[2,4]})

$`Response Y1`
[1] 0.8533156

$`Response Y2`
[1] 0.6091596

$`Response Y3`
[1] 0.4936047

$`Response Y4`
[1] 0.3243336

$`Response Y5`
[1] 0.06477015

We can then convert the resulting list of pvalues to vector of p-values using as.numeric() function.

lapply(summary(lm(t(mat) ~ ., data=meta)),
       function(x){x$coefficients[2,4]}) %>% as.numeric()

0.85331563 0.60915956 0.49360471 0.32433361 0.06477015 0.85330936
 [7] 0.16260390 0.63295885 0.07527383 0.60309934

In summary, we have seen two approaches to extract p-values from many linear models. The first one uses for-loop and the second one uses lapply() function. In a future post, we will learn how to use tiddyverse approach to extract p-values from multiple simple linear models.