How to perform multiple t-tests using tidyverse

t.test()

Published

September 3, 2024

In this tutorial, we will learn how to perform multiple t-tests to determine if there is any difference in mean using tidyverse framework. With tidyverse framework, we will use tidyverse packages/functions instead of looping through using a for loop.

Let us load the packages needed.

library(tidyverse)
library(palmerpenguin)
library(broom)
theme_set(theme_bw(16)

We have tow examples of using tidyverse to perform multiple t-tests starting from a dataframe. The first example is a toy example with small number of samples and the second example is with large sample size.

We will use Palmer penguin dataset to perform t-test. First, let us subset the penguins data so that we have just 10 samples per each species.

set.seed(42)
df 
  drop_na() |>
  group_by(species) |>
  slice_sample(n=10) |>
  ungroup()

Our sub-sampled dataset looks like this.

df |> head()

# A tibble: 6 × 8
  species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
                                                
1 Adelie  Biscoe              34.5          18.1               187        2900
2 Adelie  Torgersen           33.5          19                 190        3600
3 Adelie  Torgersen           42.1          19.1               195        4000
4 Adelie  Torgersen           41.5          18.3               195        4300
5 Adelie  Dream               41.5          18.5               201        4000
6 Adelie  Dream               37.5          18.5               199        4475
# ℹ 2 more variables: sex , year

Multiple t-tests with tidyverse: Example 1

For the t-tests, we are mainly interested in two variables from the penguins data, bill length and sex for each species. We want to perform a t-test to determine if there is a difference in bill length between the sexes for each penguin species.

To perform multiple t-tests in tidyverse framework, we will use group_by() to separate the data for each test. In this example, we group by species variable and this gives us access to each penguin species’ data.

Then we create a list column, where each element is the result of a applying t-test on bill length and sex for each species.

df |>
  group_by(species) |>
  summarize(t_test_obj = list(t.test(bill_length_mm ~ sex))) 

# A tibble: 3 × 2
  species   t_test_obj
           
1 Adelie       
2 Chinstrap    
3 Gentoo

We can convert the t-test result object into nice dataframe using broom’s tidy() function.

df |>
  group_by(species) |>
  summarize(t_test_obj = list(t.test(bill_length_mm ~ sex))) |>
  mutate(ttest_res = map(t_test_obj, tidy)) 

# A tibble: 3 × 3
  species   t_test_obj ttest_res        
                       
1 Adelie        
2 Chinstrap     
3 Gentoo

Then we unnest the result from broom to get the t-test result as dataframe. Here we have for each species, we have the t-test results.

df |>
  group_by(species) |>
  summarize(t_test_obj = list(t.test(bill_length_mm ~ sex))) |>
  mutate(ttest_res = map(t_test_obj, tidy)) |>
  unnest(ttest_res)

# A tibble: 3 × 12
  species   t_test_obj estimate estimate1 estimate2 statistic p.value parameter
                                      
1 Adelie           -4.1       36.4      40.5     -2.58  0.0381      6.70
2 Chinstrap        -3.14      47.2      50.4     -1.96  0.0892      7.25
3 Gentoo           -3.36      46.1      49.4     -2.32  0.0524      7.20
# ℹ 4 more variables: conf.low , conf.high , method ,
#   alternative

By checking the value of p.value, we can see that only of the t-test has statistically significant result. And we can see that by visualizing the actual data used as a boxplot with ggplot2.

df |>
  ggplot(aes(x=sex, y=bill_length_mm, fill=sex))+
  geom_boxplot(outlier.shape = NA)+
  geom_jitter(width=0.1)+
  facet_wrap(~species)+
  theme(legend.position = "none")+
  scale_fill_brewer(palette="Dark2")+
  scale_y_continuous(breaks=scales::breaks_pretty(6))+
  labs(title="How to perform multiple t-test with tidyverse framework")
ggsave("How_to_perform_multiple_t_tests_with_tidyverse.png", width=8, height=6)

How to perform multiple t-tests with tidyverse

How to do multiple t-tests with tidyverse: Example 2

In the second example of performing multiple t-tests, we use all of the penguin dataset, instead of sample size of just 10 per each test.

We use the same approach described above to perform multiple t-tests with tidyverse framework and take a look at the p-value from each t-test. And we can see that pvalue for each test is statistically significant, suggesting a meaningful difference in mean values of bill length between the sexes in all three penguin species.

penguins |>
  drop_na() |>
  group_by(species) |>
  summarize(t_test_obj = list(t.test(bill_length_mm ~ sex))) |>
  mutate(ttest_res = map(t_test_obj, tidy)) |>
  unnest(ttest_res)

# A tibble: 3 × 12
  species   t_test_obj estimate estimate1 estimate2 statistic  p.value parameter
                                       
1 Adelie           -3.13      37.3      40.4     -8.78 4.80e-15     142. 
2 Chinstrap        -4.52      46.6      51.1     -7.57 8.92e-10      48.7
3 Gentoo           -3.91      45.6      49.5     -8.88 1.32e-14     111. 
# ℹ 4 more variables: conf.low , conf.high , method ,
#   alternative

Visualizing the whole data using boxplots suggest the same conclusion.

penguins |>
  drop_na() |>
  ggplot(aes(x=sex, y=bill_length_mm, fill=sex))+
  geom_boxplot(outlier.shape = NA)+
  geom_jitter(width=0.1)+
  facet_wrap(~species)+
  theme(legend.position = "none")+
  scale_fill_brewer(palette="Dark2")+
  scale_y_continuous(breaks=scales::breaks_pretty(6))+
  labs(title="How to perform multiple t-test with tidyverse framework")

Performing multiple t-tests with tidyverse - Example 2

--- title: "How to perform multiple t-tests using tidyverse" date: 2024-09-03 categories: ['t.test()'] format: html: code-fold: false code-tools: true --- In this tutorial, we will learn how to perform multiple t-tests to determine if there is any difference in mean using tidyverse framework. With tidyverse framework, we will use tidyverse packages/functions instead of looping through using a for loop. Let us load the packages needed. ```r library(tidyverse) library(palmerpenguin) library(broom) theme_set(theme_bw(16) ``` We have tow examples of using tidyverse to perform multiple t-tests starting from a dataframe. The first example is a toy example with small number of samples and the second example is with large sample size. We will use Palmer penguin dataset to perform t-test. First, let us subset the penguins data so that we have just 10 samples per each species. ```r set.seed(42) df drop_na() |> group_by(species) |> slice_sample(n=10) |> ungroup() ``` Our sub-sampled dataset looks like this. ```r df |> head() # A tibble: 6 × 8 species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g 1 Adelie Biscoe 34.5 18.1 187 2900 2 Adelie Torgersen 33.5 19 190 3600 3 Adelie Torgersen 42.1 19.1 195 4000 4 Adelie Torgersen 41.5 18.3 195 4300 5 Adelie Dream 41.5 18.5 201 4000 6 Adelie Dream 37.5 18.5 199 4475 # ℹ 2 more variables: sex , year ``` ### Multiple t-tests with tidyverse: Example 1 For the t-tests, we are mainly interested in two variables from the penguins data, bill length and sex for each species. We want to perform a t-test to determine if there is a difference in bill length between the sexes for each penguin species. To perform multiple t-tests in tidyverse framework, we will use group_by() to separate the data for each test. In this example, we group by species variable and this gives us access to each penguin species' data. Then we create a list column, where each element is the result of a applying t-test on bill length and sex for each species. ```r df |> group_by(species) |> summarize(t_test_obj = list(t.test(bill_length_mm ~ sex))) # A tibble: 3 × 2 species t_test_obj 1 Adelie 2 Chinstrap 3 Gentoo ``` We can convert the t-test result object into nice dataframe using broom's tidy() function. ```r df |> group_by(species) |> summarize(t_test_obj = list(t.test(bill_length_mm ~ sex))) |> mutate(ttest_res = map(t_test_obj, tidy)) # A tibble: 3 × 3 species t_test_obj ttest_res 1 Adelie 2 Chinstrap 3 Gentoo ``` Then we unnest the result from broom to get the t-test result as dataframe. Here we have for each species, we have the t-test results. ```r df |> group_by(species) |> summarize(t_test_obj = list(t.test(bill_length_mm ~ sex))) |> mutate(ttest_res = map(t_test_obj, tidy)) |> unnest(ttest_res) # A tibble: 3 × 12 species t_test_obj estimate estimate1 estimate2 statistic p.value parameter 1 Adelie -4.1 36.4 40.5 -2.58 0.0381 6.70 2 Chinstrap -3.14 47.2 50.4 -1.96 0.0892 7.25 3 Gentoo -3.36 46.1 49.4 -2.32 0.0524 7.20 # ℹ 4 more variables: conf.low , conf.high , method , # alternative ``` By checking the value of p.value, we can see that only of the t-test has statistically significant result. And we can see that by visualizing the actual data used as a boxplot with ggplot2. ```r df |> ggplot(aes(x=sex, y=bill_length_mm, fill=sex))+ geom_boxplot(outlier.shape = NA)+ geom_jitter(width=0.1)+ facet_wrap(~species)+ theme(legend.position = "none")+ scale_fill_brewer(palette="Dark2")+ scale_y_continuous(breaks=scales::breaks_pretty(6))+ labs(title="How to perform multiple t-test with tidyverse framework") ggsave("How_to_perform_multiple_t_tests_with_tidyverse.png", width=8, height=6) ``` ![How to perform multiple t-tests with tidyverse](https://rstats101.com/wp-content/uploads/2024/08/How_to_perform_multiple_t_tests_with_tidyverse.png) How to perform multiple t-tests with tidyverse ### How to do multiple t-tests with tidyverse: Example 2 In the second example of performing multiple t-tests, we use all of the penguin dataset, instead of sample size of just 10 per each test. We use the same approach described above to perform multiple t-tests with tidyverse framework and take a look at the p-value from each t-test. And we can see that pvalue for each test is statistically significant, suggesting a meaningful difference in mean values of bill length between the sexes in all three penguin species. ```r penguins |> drop_na() |> group_by(species) |> summarize(t_test_obj = list(t.test(bill_length_mm ~ sex))) |> mutate(ttest_res = map(t_test_obj, tidy)) |> unnest(ttest_res) # A tibble: 3 × 12 species t_test_obj estimate estimate1 estimate2 statistic p.value parameter 1 Adelie -3.13 37.3 40.4 -8.78 4.80e-15 142. 2 Chinstrap -4.52 46.6 51.1 -7.57 8.92e-10 48.7 3 Gentoo -3.91 45.6 49.5 -8.88 1.32e-14 111. # ℹ 4 more variables: conf.low , conf.high , method , # alternative ``` Visualizing the whole data using boxplots suggest the same conclusion. ```r penguins |> drop_na() |> ggplot(aes(x=sex, y=bill_length_mm, fill=sex))+ geom_boxplot(outlier.shape = NA)+ geom_jitter(width=0.1)+ facet_wrap(~species)+ theme(legend.position = "none")+ scale_fill_brewer(palette="Dark2")+ scale_y_continuous(breaks=scales::breaks_pretty(6))+ labs(title="How to perform multiple t-test with tidyverse framework") ``` ![Performing multiple t-tests with tidyverse - Example 2](https://rstats101.com/wp-content/uploads/2024/08/How_to_perform_multiple_t_tests_with_tidyverse_example2.png) Performing multiple t-tests with tidyverse - Example 2