How to Extract p-values from multiple simple linear regression models
Sometimes you might fit many simple linear regression models and would like to extract p-values from each model. In this tutorial, we will learn two approaches to extract p-values from multiple simple linear regression models built in R. We will first use for loop to build and extract pvalue from multiple linear models and then we will learn how to use lapply() function in base R to apply lm() and extract p-values from multiple simple linear regression models.
Simulate data for fitting many linear models
To fit many linear models we need data as matrix or a dataframe. Let us simulate some data using random numbers as a matrix.
set.seed(1)
# Simulate 10 features from 50 individuals
mat % head()
# A tibble: 6 × 1
condition
1 A
2 A
3 A
4 A
5 A
6 ALet us quickly look at the result of building linear model on just one of the rows.
summary(lm(mat[2, ] ~ ., data=meta))
Call:
lm(formula = mat[2, ] ~ ., data = meta)
Residuals:
Min 1Q Median 3Q Max
-3.03746 -0.66902 0.00681 0.76742 1.98526
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.1485 0.2073 0.717 0.477
conditionB 0.1509 0.2931 0.515 0.609
Residual standard error: 1.036 on 48 degrees of freedom
Multiple R-squared: 0.005488, Adjusted R-squared: -0.01523
F-statistic: 0.2649 on 1 and 48 DF, p-value: 0.6092Extracting p-values from multiple simple linear models using for-loop
Let us use for loop to loop through each row of the data matrix and build simple linear regression model. We would extract p-value using summary() function on the linear fit using coefficients method as before
pvals |t|)
(Intercept) 0.06218 0.21619 0.288 0.775
conditionB 0.05683 0.30573 0.186 0.853
Residual standard error: 1.081 on 48 degrees of freedom
Multiple R-squared: 0.0007194, Adjusted R-squared: -0.0201
F-statistic: 0.03455 on 1 and 48 DF, p-value: 0.8533
Response Y2 :
Call:
lm(formula = Y2 ~ condition, data = meta)
Residuals:
Min 1Q Median 3Q Max
-3.03746 -0.66902 0.00681 0.76742 1.98526
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.1485 0.2073 0.717 0.477
conditionB 0.1509 0.2931 0.515 0.609
Residual standard error: 1.036 on 48 degrees of freedom
Multiple R-squared: 0.005488, Adjusted R-squared: -0.01523
F-statistic: 0.2649 on 1 and 48 DF, p-value: 0.6092
....
.....We can then use base R’s lapply() function on each of the linear fit and extract the p-value. lapply() function applies a function over all the elements of a list or vector. In our example we have list of summary objects. And we can write an anonymous function that extracts the pval as shown below. lapply returns
a list of the same length as the input list, where each element is the result of applying the function to the corresponding element.
lapply(summary(lm(t(mat) ~ ., data = meta)),
function(x){x$coefficients[2,4]})
$`Response Y1`
[1] 0.8533156
$`Response Y2`
[1] 0.6091596
$`Response Y3`
[1] 0.4936047
$`Response Y4`
[1] 0.3243336
$`Response Y5`
[1] 0.06477015We can then convert the resulting list of pvalues to vector of p-values using as.numeric() function.
lapply(summary(lm(t(mat) ~ ., data=meta)),
function(x){x$coefficients[2,4]}) %>% as.numeric()
0.85331563 0.60915956 0.49360471 0.32433361 0.06477015 0.85330936
[7] 0.16260390 0.63295885 0.07527383 0.60309934In summary, we have seen two approaches to extract p-values from many linear models. The first one uses for-loop and the second one uses lapply() function. In a future post, we will learn how to use tiddyverse approach to extract p-values from multiple simple linear models.