How to perform multiple t-tests using tidyverse
In this tutorial, we will learn how to perform multiple t-tests to determine if there is any difference in mean using tidyverse framework. With tidyverse framework, we will use tidyverse packages/functions instead of looping through using a for loop.
Let us load the packages needed.
library(tidyverse)
library(palmerpenguin)
library(broom)
theme_set(theme_bw(16)We have tow examples of using tidyverse to perform multiple t-tests starting from a dataframe. The first example is a toy example with small number of samples and the second example is with large sample size.
We will use Palmer penguin dataset to perform t-test. First, let us subset the penguins data so that we have just 10 samples per each species.
set.seed(42)
df
drop_na() |>
group_by(species) |>
slice_sample(n=10) |>
ungroup()Our sub-sampled dataset looks like this.
df |> head()
# A tibble: 6 × 8
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
1 Adelie Biscoe 34.5 18.1 187 2900
2 Adelie Torgersen 33.5 19 190 3600
3 Adelie Torgersen 42.1 19.1 195 4000
4 Adelie Torgersen 41.5 18.3 195 4300
5 Adelie Dream 41.5 18.5 201 4000
6 Adelie Dream 37.5 18.5 199 4475
# ℹ 2 more variables: sex , year Multiple t-tests with tidyverse: Example 1
For the t-tests, we are mainly interested in two variables from the penguins data, bill length and sex for each species. We want to perform a t-test to determine if there is a difference in bill length between the sexes for each penguin species.
To perform multiple t-tests in tidyverse framework, we will use group_by() to separate the data for each test. In this example, we group by species variable and this gives us access to each penguin species’ data.
Then we create a list column, where each element is the result of a applying t-test on bill length and sex for each species.
df |>
group_by(species) |>
summarize(t_test_obj = list(t.test(bill_length_mm ~ sex)))
# A tibble: 3 × 2
species t_test_obj
1 Adelie
2 Chinstrap
3 Gentoo We can convert the t-test result object into nice dataframe using broom’s tidy() function.
df |>
group_by(species) |>
summarize(t_test_obj = list(t.test(bill_length_mm ~ sex))) |>
mutate(ttest_res = map(t_test_obj, tidy))
# A tibble: 3 × 3
species t_test_obj ttest_res
1 Adelie
2 Chinstrap
3 Gentoo Then we unnest the result from broom to get the t-test result as dataframe. Here we have for each species, we have the t-test results.
df |>
group_by(species) |>
summarize(t_test_obj = list(t.test(bill_length_mm ~ sex))) |>
mutate(ttest_res = map(t_test_obj, tidy)) |>
unnest(ttest_res)
# A tibble: 3 × 12
species t_test_obj estimate estimate1 estimate2 statistic p.value parameter
1 Adelie -4.1 36.4 40.5 -2.58 0.0381 6.70
2 Chinstrap -3.14 47.2 50.4 -1.96 0.0892 7.25
3 Gentoo -3.36 46.1 49.4 -2.32 0.0524 7.20
# ℹ 4 more variables: conf.low , conf.high , method ,
# alternative By checking the value of p.value, we can see that only of the t-test has statistically significant result. And we can see that by visualizing the actual data used as a boxplot with ggplot2.
df |>
ggplot(aes(x=sex, y=bill_length_mm, fill=sex))+
geom_boxplot(outlier.shape = NA)+
geom_jitter(width=0.1)+
facet_wrap(~species)+
theme(legend.position = "none")+
scale_fill_brewer(palette="Dark2")+
scale_y_continuous(breaks=scales::breaks_pretty(6))+
labs(title="How to perform multiple t-test with tidyverse framework")
ggsave("How_to_perform_multiple_t_tests_with_tidyverse.png", width=8, height=6)
How to perform multiple t-tests with tidyverse
How to do multiple t-tests with tidyverse: Example 2
In the second example of performing multiple t-tests, we use all of the penguin dataset, instead of sample size of just 10 per each test.
We use the same approach described above to perform multiple t-tests with tidyverse framework and take a look at the p-value from each t-test. And we can see that pvalue for each test is statistically significant, suggesting a meaningful difference in mean values of bill length between the sexes in all three penguin species.
penguins |>
drop_na() |>
group_by(species) |>
summarize(t_test_obj = list(t.test(bill_length_mm ~ sex))) |>
mutate(ttest_res = map(t_test_obj, tidy)) |>
unnest(ttest_res)
# A tibble: 3 × 12
species t_test_obj estimate estimate1 estimate2 statistic p.value parameter
1 Adelie -3.13 37.3 40.4 -8.78 4.80e-15 142.
2 Chinstrap -4.52 46.6 51.1 -7.57 8.92e-10 48.7
3 Gentoo -3.91 45.6 49.5 -8.88 1.32e-14 111.
# ℹ 4 more variables: conf.low , conf.high , method ,
# alternative Visualizing the whole data using boxplots suggest the same conclusion.
penguins |>
drop_na() |>
ggplot(aes(x=sex, y=bill_length_mm, fill=sex))+
geom_boxplot(outlier.shape = NA)+
geom_jitter(width=0.1)+
facet_wrap(~species)+
theme(legend.position = "none")+
scale_fill_brewer(palette="Dark2")+
scale_y_continuous(breaks=scales::breaks_pretty(6))+
labs(title="How to perform multiple t-test with tidyverse framework")
Performing multiple t-tests with tidyverse - Example 2