How to apply a function on multiple columns using across()

dplyr

dplyr across()

Learn how to apply a function on multiple columns using across() with this comprehensive R tutorial. Includes practical examples and code snippets.

Published

September 9, 2023

Introduction

The across() function in R’s dplyr package is a powerful tool for applying functions to multiple columns simultaneously. Instead of writing repetitive code to perform the same operation on different columns, across() allows you to select multiple columns and apply transformations efficiently in a single step.

This function is particularly useful when you need to summarize data, transform variables, or perform calculations across several columns that share similar characteristics. Whether you’re calculating means for numeric variables, converting data types, or applying custom functions to selected columns, across() streamlines your workflow and makes your code more readable and maintainable.

Getting Started

First, let’s load the required packages. We’ll use the tidyverse for data manipulation and the palmerpenguins dataset for our examples.

library(tidyverse)
library(palmerpenguins)

# Preview the penguins dataset
glimpse(penguins)

Example 1: Basic Usage

Let’s start with a simple example using the penguins dataset. We’ll calculate the mean of all numeric columns, removing missing values.

# Basic usage: calculate means for all numeric columns
penguins |>
  summarise(across(where(is.numeric), mean, na.rm = TRUE))

# Apply across() to specific columns by name
penguins |>
  summarise(across(c(bill_length_mm, bill_depth_mm, flipper_length_mm), 
                   mean, na.rm = TRUE))

# Use column selection helpers
penguins |>
  summarise(across(ends_with("_mm"), mean, na.rm = TRUE))

You can also apply multiple functions to the same columns:

# Apply multiple functions using a list
penguins |>
  summarise(across(where(is.numeric), 
                   list(mean = mean, sd = sd), 
                   na.rm = TRUE))

Example 2: Practical Application

Let’s explore a more comprehensive example that demonstrates across() in a real-world scenario. We’ll analyze penguin measurements by species and island, applying different transformations and summaries.

# Group by species and calculate multiple statistics
penguin_summary <- penguins |>
  group_by(species, island) |>
  summarise(
    # Count observations
    n = n(),
    # Calculate means for measurement columns
    across(c(bill_length_mm, bill_depth_mm, flipper_length_mm, body_mass_g),
           list(mean = ~mean(.x, na.rm = TRUE),
                sd = ~sd(.x, na.rm = TRUE)),
           .names = "{.col}_{.fn}"),
    .groups = "drop"
  )

# View the results
penguin_summary

Here’s another practical example showing data transformation:

# Transform multiple columns: standardize numeric variables
penguins_standardized <- penguins |>
  mutate(across(where(is.numeric), 
                ~scale(.x)[,1], 
                .names = "{.col}_scaled")) |>
  select(species, island, ends_with("_scaled"))

# Convert multiple columns to different data types
penguins_transformed <- penguins |>
  mutate(
    # Convert character columns to factors
    across(where(is.character), as.factor),
    # Round numeric columns to 1 decimal place
    across(where(is.numeric), ~round(.x, 1))
  )

You can also use across() with conditional logic:

# Apply different functions based on column characteristics
penguins |>
  group_by(species) |>
  summarise(
    across(where(is.numeric) & !contains("year"), 
           list(min = min, max = max), 
           na.rm = TRUE),
    across(where(is.factor), ~length(unique(.x))),
    .groups = "drop"
  )

Summary

The across() function is an essential tool for efficient data manipulation in R. Key takeaways include:

Use across() to apply functions to multiple columns simultaneously, reducing code duplication
Combine it with selection helpers like where(), starts_with(), ends_with(), and contains() for flexible column selection
Apply multiple functions using lists and control output names with .names parameter
across() works seamlessly with group_by() for grouped operations
Use anonymous functions with ~ syntax for custom transformations

Master `across()` to write more concise, readable, and maintainable R code for your data analysis workflows.

--- title: "How to apply a function on multiple columns using across()" description: "Learn how to apply a function on multiple columns using across() with this comprehensive R tutorial. Includes practical examples and code snippets." date: 2023-09-09 categories: ['dplyr', 'dplyr across()'] format: html: code-fold: false code-tools: true --- ## Introduction The [`across()`](/dplyr/how-to-use-across-in-r.html) function in R's dplyr package is a powerful tool for applying functions to multiple columns simultaneously. Instead of writing repetitive code to perform the same operation on different columns, `across()` allows you to select multiple columns and apply transformations efficiently in a single step. This function is particularly useful when you need to summarize data, transform variables, or perform calculations across several columns that share similar characteristics. Whether you're calculating means for numeric variables, converting data types, or applying custom functions to selected columns, `across()` streamlines your workflow and makes your code more readable and maintainable. ## Getting Started First, let's load the required packages. We'll use the tidyverse for data manipulation and the palmerpenguins dataset for our examples. ```r library(tidyverse) library(palmerpenguins) # Preview the penguins dataset glimpse(penguins) ``` ## Example 1: Basic Usage Let's start with a simple example using the penguins dataset. We'll calculate the mean of all numeric columns, removing missing values. ```r # Basic usage: calculate means for all numeric columns penguins |> summarise(across(where(is.numeric), mean, na.rm = TRUE)) # Apply across() to specific columns by name penguins |> summarise(across(c(bill_length_mm, bill_depth_mm, flipper_length_mm), mean, na.rm = TRUE)) # Use column selection helpers penguins |> summarise(across(ends_with("_mm"), mean, na.rm = TRUE)) ``` You can also apply multiple functions to the same columns: ```r # Apply multiple functions using a list penguins |> summarise(across(where(is.numeric), list(mean = mean, sd = sd), na.rm = TRUE)) ``` ## Example 2: Practical Application Let's explore a more comprehensive example that demonstrates `across()` in a real-world scenario. We'll analyze penguin measurements by species and island, applying different transformations and summaries. ```r # Group by species and calculate multiple statistics penguin_summary <- penguins |> group_by(species, island) |> summarise( # Count observations n = n(), # Calculate means for measurement columns across(c(bill_length_mm, bill_depth_mm, flipper_length_mm, body_mass_g), list(mean = ~mean(.x, na.rm = TRUE), sd = ~sd(.x, na.rm = TRUE)), .names = "{.col}_{.fn}"), .groups = "drop" ) # View the results penguin_summary ``` Here's another practical example showing data transformation: ```r # Transform multiple columns: standardize numeric variables penguins_standardized <- penguins |> mutate(across(where(is.numeric), ~scale(.x)[,1], .names = "{.col}_scaled")) |> select(species, island, ends_with("_scaled")) # Convert multiple columns to different data types penguins_transformed <- penguins |> mutate( # Convert character columns to factors across(where(is.character), as.factor), # Round numeric columns to 1 decimal place across(where(is.numeric), ~round(.x, 1)) ) ``` You can also use `across()` with conditional logic: ```r # Apply different functions based on column characteristics penguins |> group_by(species) |> summarise( across(where(is.numeric) & !contains("year"), list(min = min, max = max), na.rm = TRUE), across(where(is.factor), ~length(unique(.x))), .groups = "drop" ) ``` ## Summary The `across()` function is an essential tool for efficient data manipulation in R. Key takeaways include: - Use `across()` to apply functions to multiple columns simultaneously, reducing code duplication - Combine it with selection helpers like [`where()`](/dplyr/how-to-use-where-in-r.html), [`starts_with()`](/dplyr/how-to-use-startswith-in-r.html), `ends_with()`, and `contains()` for flexible column selection - Apply multiple functions using lists and control output names with `.names` parameter - `across()` works seamlessly with [`group_by()`](/dplyr/how-to-use-groupby-in-r.html) for grouped operations - Use anonymous functions with `~` syntax for custom transformations Master `across()` to write more concise, readable, and maintainable R code for your data analysis workflows. --- ## Related Posts - [How to collapse multiple rows based on a column](/dplyr/how-to-collapse-multiple-rows-based-on-a-column.html) - [tidyr unite(): combine multiple columns into one](/tidyr/tidyr-unite-combine-multiple-columns-into-one.html) - [How to select only numeric columns in a dataframe](/dplyr/select-all-numeric-columns-in-a-dataframe.html) - [dplyr's mutate(): How to create new columns](/dplyr/dplyr-mutate-create-new-columns.html) - [pivot_longer on dataframe with single row](/tidyr/pivot_longer-on-dataframe-with-single-row.html)