How to perform multiple t-tests using tidyverse

t.test()

Learn how to perform multiple t-tests using tidyverse with this comprehensive R tutorial. Includes practical examples and code snippets.

Published

September 3, 2024

Introduction

Multiple t-tests allow you to compare means across different groups or conditions within your dataset simultaneously. This approach is particularly useful when you have multiple variables or subgroups and want to test for significant differences efficiently using tidyverse tools.

Getting Started

library(tidyverse)
library(palmerpenguins)

Example 1: Basic Usage

The Problem

We want to test if penguin body measurements differ significantly between male and female penguins. Instead of running separate t-tests for each measurement, we’ll perform them all at once.

Step 1: Prepare the Data

First, we’ll reshape our data to have measurements in a long format.

penguins_long <- penguins |>
  drop_na(sex) |>
  select(sex, bill_length_mm, bill_depth_mm, flipper_length_mm, body_mass_g) |>
  pivot_longer(cols = -sex, names_to = "measurement", values_to = "value")

This creates a long-format dataset where each measurement type becomes a separate row, making it easier to group and analyze.

Step 2: Group and Nest the Data

Next, we’ll group by measurement type and nest the data for each group.

penguins_nested <- penguins_long |>
  drop_na(value) |>
  group_by(measurement) |>
  nest()

Now we have a tibble with one row per measurement type, and each row contains a nested dataset with sex and value columns.

Step 3: Perform Multiple T-Tests

We’ll use map() to apply t-tests to each nested dataset.

t_test_results <- penguins_nested |>
  mutate(
    t_test = map(data, ~t.test(value ~ sex, data = .x)),
    tidy_results = map(t_test, broom::tidy)
  )

This applies a t-test comparing values between sexes for each measurement type and creates tidy results using the broom package.

Step 4: Extract and View Results

Finally, we’ll unnest the results to see all t-test outcomes in a clean format.

final_results <- t_test_results |>
  select(measurement, tidy_results) |>
  unnest(tidy_results)

print(final_results)

This gives us a comprehensive table with p-values, confidence intervals, and test statistics for each measurement comparison.

Example 2: Practical Application

The Problem

A researcher wants to compare penguin measurements across different species pairs (Adelie vs Chinstrap, Adelie vs Gentoo, Chinstrap vs Gentoo). They need to perform multiple comparisons efficiently while controlling for multiple testing.

Step 1: Create Species Pairs

We’ll create all possible species pair combinations for comparison.

species_pairs <- list(
  c("Adelie", "Chinstrap"),
  c("Adelie", "Gentoo"),
  c("Chinstrap", "Gentoo")
)

This creates a list of species pairs that we’ll use for pairwise comparisons.

Step 2: Prepare Data for Each Comparison

We’ll create a function to filter data for each species pair and apply it to our measurements.

compare_species <- function(species1, species2, data) {
  data |>
    filter(species %in% c(species1, species2)) |>
    drop_na(sex, bill_length_mm)
}

This function filters the penguin data to include only the two species we want to compare.

Step 3: Run Pairwise T-Tests

Now we’ll perform t-tests for each species pair across bill length measurements.

pairwise_results <- map_dfr(species_pairs, function(pair) {
  filtered_data <- compare_species(pair[1], pair[2], penguins)
  
  if(nrow(filtered_data) > 0) {
    test_result <- t.test(bill_length_mm ~ species, data = filtered_data)
    broom::tidy(test_result) |>
      mutate(comparison = paste(pair[1], "vs", pair[2]))
  }
})

This creates a comprehensive results table with t-test statistics for each species comparison.

Step 4: Apply Multiple Testing Correction

We’ll adjust p-values to account for multiple comparisons using the Bonferroni method.

final_pairwise <- pairwise_results |>
  mutate(
    p.adjusted = p.adjust(p.value, method = "bonferroni"),
    significant = p.adjusted < 0.05
  ) |>
  select(comparison, estimate, p.value, p.adjusted, significant)

The adjusted p-values help control for Type I error when performing multiple tests simultaneously.

Summary

Use pivot_longer() and nest() to prepare data for multiple t-tests across different variables
Combine map() with t.test() to efficiently run multiple comparisons in a single pipeline
The broom package’s tidy() function converts t-test results into clean, analyzable tibbles
Always consider multiple testing corrections when performing numerous simultaneous tests
This approach scales well for comparing multiple groups, variables, or conditions simultaneously

--- title: "How to perform multiple t-tests using tidyverse" description: "Learn how to perform multiple t-tests using tidyverse with this comprehensive R tutorial. Includes practical examples and code snippets." date: 2024-09-03 categories: ['t.test()'] format: html: code-fold: false code-tools: true --- ## Introduction Multiple t-tests allow you to compare means across different groups or conditions within your dataset simultaneously. This approach is particularly useful when you have multiple variables or subgroups and want to test for significant differences efficiently using tidyverse tools. ## Getting Started ```r library(tidyverse) library(palmerpenguins) ``` ## Example 1: Basic Usage ### The Problem We want to test if penguin body measurements differ significantly between male and female penguins. Instead of running separate t-tests for each measurement, we'll perform them all at once. ### Step 1: Prepare the Data First, we'll reshape our data to have measurements in a long format. ```r penguins_long <- penguins |> drop_na(sex) |> select(sex, bill_length_mm, bill_depth_mm, flipper_length_mm, body_mass_g) |> pivot_longer(cols = -sex, names_to = "measurement", values_to = "value") ``` This creates a long-format dataset where each measurement type becomes a separate row, making it easier to group and analyze. ### Step 2: Group and Nest the Data Next, we'll group by measurement type and nest the data for each group. ```r penguins_nested <- penguins_long |> drop_na(value) |> group_by(measurement) |> nest() ``` Now we have a tibble with one row per measurement type, and each row contains a nested dataset with sex and value columns. ### Step 3: Perform Multiple T-Tests We'll use map() to apply t-tests to each nested dataset. ```r t_test_results <- penguins_nested |> mutate( t_test = map(data, ~t.test(value ~ sex, data = .x)), tidy_results = map(t_test, broom::tidy) ) ``` This applies a t-test comparing values between sexes for each measurement type and creates tidy results using the broom package. ### Step 4: Extract and View Results Finally, we'll unnest the results to see all t-test outcomes in a clean format. ```r final_results <- t_test_results |> select(measurement, tidy_results) |> unnest(tidy_results) print(final_results) ``` This gives us a comprehensive table with p-values, confidence intervals, and test statistics for each measurement comparison. ## Example 2: Practical Application ### The Problem A researcher wants to compare penguin measurements across different species pairs (Adelie vs Chinstrap, Adelie vs Gentoo, Chinstrap vs Gentoo). They need to perform multiple comparisons efficiently while controlling for multiple testing. ### Step 1: Create Species Pairs We'll create all possible species pair combinations for comparison. ```r species_pairs <- list( c("Adelie", "Chinstrap"), c("Adelie", "Gentoo"), c("Chinstrap", "Gentoo") ) ``` This creates a list of species pairs that we'll use for pairwise comparisons. ### Step 2: Prepare Data for Each Comparison We'll create a function to filter data for each species pair and apply it to our measurements. ```r compare_species <- function(species1, species2, data) { data |> filter(species %in% c(species1, species2)) |> drop_na(sex, bill_length_mm) } ``` This function filters the penguin data to include only the two species we want to compare. ### Step 3: Run Pairwise T-Tests Now we'll perform t-tests for each species pair across bill length measurements. ```r pairwise_results <- map_dfr(species_pairs, function(pair) { filtered_data <- compare_species(pair[1], pair[2], penguins) if(nrow(filtered_data) > 0) { test_result <- t.test(bill_length_mm ~ species, data = filtered_data) broom::tidy(test_result) |> mutate(comparison = paste(pair[1], "vs", pair[2])) } }) ``` This creates a comprehensive results table with t-test statistics for each species comparison. ### Step 4: Apply Multiple Testing Correction We'll adjust p-values to account for multiple comparisons using the Bonferroni method. ```r final_pairwise <- pairwise_results |> mutate( p.adjusted = p.adjust(p.value, method = "bonferroni"), significant = p.adjusted < 0.05 ) |> select(comparison, estimate, p.value, p.adjusted, significant) ``` The adjusted p-values help control for Type I error when performing multiple tests simultaneously. ## Summary - Use [`pivot_longer()`](/tidyr/how-to-use-pivotlonger-in-r.html) and [`nest()`](/tidyr/how-to-use-nest-in-r.html) to prepare data for multiple t-tests across different variables - Combine `map()` with [`t.test()`](/statistics/how-to-perform-t-test-in-r.html) to efficiently run multiple comparisons in a single pipeline - The broom package's `tidy()` function converts t-test results into clean, analyzable tibbles - Always consider multiple testing corrections when performing numerous simultaneous tests - This approach scales well for comparing multiple groups, variables, or conditions simultaneously --- ## Related Posts - [How to perform t-test in R](/statistics/how-to-perform-t-test-in-r.html) - [T-test on real data using tidyverse](/statistics/t-test-on-real-data-using-tidyverse.html) - [How to Extract p-values from multiple simple linear regression models](/statistics/extract-p-values-from-multiple-simple-linear-regression-models.html) - [How to apply a function on multiple columns using across()](/dplyr/apply-a-function-on-multiple-columns-using-across.html) - [3 ways to rank numbers with tidyverse](/dplyr/rank-numbers-with-tidyverse.html)