How to use across() in R

dplyr
dplyr across()
Learn how to use across() in R with practical examples. Step-by-step guide with code you can copy and run immediately.
Published

February 21, 2026

Introduction

The across() function is a powerful tool in the dplyr package that allows you to apply the same operation to multiple columns simultaneously. Instead of writing repetitive code to perform similar transformations on different columns, across() lets you select multiple columns and apply functions to them in a single, elegant expression.

You’ll find across() particularly useful when you need to summarize multiple numeric columns, apply the same transformation to several variables, or perform operations on columns that share common characteristics. It’s commonly used within summarise(), mutate(), and other dplyr verbs to make your data manipulation code more concise and maintainable.

Getting Started

First, let’s load the required packages:

library(tidyverse)
library(palmerpenguins)

Example 1: Basic Usage

Let’s start with a simple example using the penguins dataset. Suppose we want to calculate the mean of all numeric columns:

penguins |>
  summarise(across(where(is.numeric), mean, na.rm = TRUE))

We can also apply multiple functions to the same columns by providing a list of functions:

penguins |>
  summarise(across(c(bill_length_mm, bill_depth_mm, flipper_length_mm), 
                   list(mean = mean, sd = sd), 
                   na.rm = TRUE))

Here’s how to use across() with mutate() to transform multiple columns. Let’s convert millimeter measurements to centimeters:

penguins |>
  mutate(across(ends_with("_mm"), ~ .x / 10, .names = "{.col}_cm")) |>
  select(species, contains("_cm"))

Example 2: Practical Application

Now let’s work with a more complex real-world scenario. Imagine we’re analyzing penguin data and need to create a comprehensive summary report grouped by species and island. We’ll use across() to efficiently calculate multiple statistics:

penguin_summary <- penguins |>
  group_by(species, island) |>
  summarise(
    count = n(),
    across(c(bill_length_mm, bill_depth_mm, flipper_length_mm, body_mass_g),
           list(
             mean = ~ mean(.x, na.rm = TRUE),
             median = ~ median(.x, na.rm = TRUE),
             min = ~ min(.x, na.rm = TRUE),
             max = ~ max(.x, na.rm = TRUE)
           ),
           .names = "{.col}_{.fn}"),
    .groups = "drop"
  )

penguin_summary

We can also use across() with conditional logic. Let’s standardize (z-score) all numeric measurements while preserving the original grouping variables:

penguins_standardized <- penguins |>
  group_by(species) |>
  mutate(across(where(is.numeric), 
                ~ scale(.x)[,1], 
                .names = "{.col}_std")) |>
  ungroup()

penguins_standardized |>
  select(species, contains("_std"))

Here’s another practical example where we handle missing values differently for different types of columns:

penguins_cleaned <- penguins |>
  mutate(
    across(where(is.numeric), ~ ifelse(is.na(.x), median(.x, na.rm = TRUE), .x)),
    across(where(is.character), ~ ifelse(is.na(.x), "Unknown", .x))
  )

penguins_cleaned |>
  summarise(across(everything(), ~ sum(is.na(.x))))

Summary

The across() function is essential for efficient data manipulation in R. Key takeaways include:

  • Use across(where(condition), function) to apply operations to columns meeting specific criteria
  • Combine across() with column selection helpers like starts_with(), ends_with(), or contains()
  • Apply multiple functions using lists: across(cols, list(mean = mean, sd = sd))
  • Control output names with the .names parameter using {.col} and {.fn} placeholders
  • The ~ syntax allows for more complex transformations within across()

Master across() to write cleaner, more maintainable code that scales well when working with datasets containing many similar columns.