How to use sapply in R

base-r

sapply

Master sapply in R programming with clear examples. Complete tutorial covering syntax, use cases, and best practices.

Published

February 21, 2026

Introduction

The sapply() function in R applies a function to each element of a list or vector and returns a simplified result, typically a vector or matrix. It’s particularly useful when you need to perform the same operation across multiple data elements and want cleaner output than lapply() provides.

Getting Started

library(tidyverse)
data(mtcars)
data(penguins, package = "palmerpenguins")

Example 1: Basic Usage

The Problem

We need to calculate summary statistics for multiple numeric columns in the mtcars dataset. Instead of writing separate functions for each column, we want to apply the same function efficiently across all columns.

Step 1: Create sample data

First, let’s select a few numeric columns to work with.

# Select key numeric variables
car_data <- mtcars |> 
  select(mpg, hp, wt, qsec)

head(car_data, 3)

This gives us a clean subset with four numeric variables to analyze.

Step 2: Apply a single function

Now we’ll use sapply() to calculate the mean of each column.

# Calculate mean for each column
column_means <- sapply(car_data, mean)
print(column_means)

The sapply() function applied the mean() function to each column and returned a named vector with the results.

Step 3: Apply with additional arguments

We can pass additional arguments to our function through sapply().

# Add some NA values for demonstration
car_data_na <- car_data
car_data_na[1, "mpg"] <- NA

# Calculate means ignoring NA values
means_no_na <- sapply(car_data_na, mean, na.rm = TRUE)
print(means_no_na)

The na.rm = TRUE argument was passed to each mean() function call, handling missing values properly.

Example 2: Practical Application

The Problem

We’re analyzing the penguins dataset and need to identify which numeric measurements have outliers and calculate multiple summary statistics. We want to create a comprehensive overview of data quality and distribution for each numeric variable.

Step 1: Prepare the data

Let’s extract numeric columns and remove any missing values for clean analysis.

# Get numeric columns from penguins
penguin_numeric <- penguins |> 
  select(bill_length_mm, bill_depth_mm, 
         flipper_length_mm, body_mass_g) |> 
  na.omit()

dim(penguin_numeric)

This gives us a clean dataset with four numeric measurements for analysis.

Step 2: Create a custom function

We’ll build a function that returns multiple statistics for outlier detection.

# Function to calculate summary stats
get_stats <- function(x) {
  c(mean = mean(x),
    median = median(x),
    sd = sd(x),
    iqr = IQR(x))
}

This function returns four key statistics that help us understand each variable’s distribution.

Step 3: Apply custom function

Now we’ll use sapply() to apply our custom function across all columns.

# Apply custom function to all columns
penguin_stats <- sapply(penguin_numeric, get_stats)
print(round(penguin_stats, 2))

The result is a matrix where each column represents a variable and each row represents a different statistic.

Step 4: Create logical tests

We can also use sapply() for logical operations across columns.

# Check which variables have high variability (CV > 0.15)
high_variation <- sapply(penguin_numeric, function(x) {
  coefficient_variation <- sd(x) / mean(x)
  coefficient_variation > 0.15
})

print(high_variation)

This returns a logical vector showing which measurements have high relative variability.

Step 5: Count categories by groups

Using sapply() with factors to count occurrences across different groupings.

# Count species occurrences
species_counts <- sapply(split(penguins$species, penguins$island), 
                        function(x) table(x))

print(species_counts)

This creates a breakdown of species counts by island, demonstrating sapply() with more complex data structures.

Summary

sapply() applies functions across list or vector elements and simplifies results into vectors or matrices
It’s ideal for calculating summary statistics across multiple columns efficiently
You can pass additional arguments to functions using extra parameters in sapply()
Custom functions work seamlessly with sapply() for complex operations
The function returns simplified output compared to lapply(), making results easier to read and work with

--- title: "How to use sapply in R" description: "Master sapply in R programming with clear examples. Complete tutorial covering syntax, use cases, and best practices." date: 2026-02-21 categories: ['base-r', 'sapply'] format: html: code-fold: false code-tools: true --- ## Introduction The `sapply()` function in R applies a function to each element of a list or vector and returns a simplified result, typically a vector or matrix. It's particularly useful when you need to perform the same operation across multiple data elements and want cleaner output than [`lapply()`](/base-r/how-to-use-lapply-in-r.html) provides. ## Getting Started ```r library(tidyverse) data(mtcars) data(penguins, package = "palmerpenguins") ``` ## Example 1: Basic Usage ### The Problem We need to calculate summary statistics for multiple numeric columns in the mtcars dataset. Instead of writing separate functions for each column, we want to apply the same function efficiently across all columns. ### Step 1: Create sample data First, let's select a few numeric columns to work with. ```r # Select key numeric variables car_data <- mtcars |> select(mpg, hp, wt, qsec) head(car_data, 3) ``` This gives us a clean subset with four numeric variables to analyze. ### Step 2: Apply a single function Now we'll use `sapply()` to calculate the mean of each column. ```r # Calculate mean for each column column_means <- sapply(car_data, mean) print(column_means) ``` The `sapply()` function applied the `mean()` function to each column and returned a named vector with the results. ### Step 3: Apply with additional arguments We can pass additional arguments to our function through `sapply()`. ```r # Add some NA values for demonstration car_data_na <- car_data car_data_na[1, "mpg"] <- NA # Calculate means ignoring NA values means_no_na <- sapply(car_data_na, mean, na.rm = TRUE) print(means_no_na) ``` The `na.rm = TRUE` argument was passed to each `mean()` function call, handling missing values properly. ## Example 2: Practical Application ### The Problem We're analyzing the penguins dataset and need to identify which numeric measurements have outliers and calculate multiple summary statistics. We want to create a comprehensive overview of data quality and distribution for each numeric variable. ### Step 1: Prepare the data Let's extract numeric columns and remove any missing values for clean analysis. ```r # Get numeric columns from penguins penguin_numeric <- penguins |> select(bill_length_mm, bill_depth_mm, flipper_length_mm, body_mass_g) |> na.omit() dim(penguin_numeric) ``` This gives us a clean dataset with four numeric measurements for analysis. ### Step 2: Create a custom function We'll build a function that returns multiple statistics for outlier detection. ```r # Function to calculate summary stats get_stats <- function(x) { c(mean = mean(x), median = median(x), sd = sd(x), iqr = IQR(x)) } ``` This function returns four key statistics that help us understand each variable's distribution. ### Step 3: Apply custom function Now we'll use `sapply()` to apply our custom function across all columns. ```r # Apply custom function to all columns penguin_stats <- sapply(penguin_numeric, get_stats) print(round(penguin_stats, 2)) ``` The result is a matrix where each column represents a variable and each row represents a different statistic. ### Step 4: Create logical tests We can also use `sapply()` for logical operations across columns. ```r # Check which variables have high variability (CV > 0.15) high_variation <- sapply(penguin_numeric, function(x) { coefficient_variation <- sd(x) / mean(x) coefficient_variation > 0.15 }) print(high_variation) ``` This returns a logical vector showing which measurements have high relative variability. ### Step 5: Count categories by groups Using `sapply()` with factors to count occurrences across different groupings. ```r # Count species occurrences species_counts <- sapply(split(penguins$species, penguins$island), function(x) table(x)) print(species_counts) ``` This creates a breakdown of species counts by island, demonstrating `sapply()` with more complex data structures. ## Summary - `sapply()` applies functions across list or vector elements and simplifies results into vectors or matrices - It's ideal for calculating summary statistics across multiple columns efficiently - You can pass additional arguments to functions using extra parameters in `sapply()` - Custom functions work seamlessly with `sapply()` for complex operations - The function returns simplified output compared to `lapply()`, making results easier to read and work with --- ## Related Posts - [How to use mapply in R](/base-r/how-to-use-mapply-in-r.html) - [How to use read.csv in R](/base-r/how-to-use-readcsv-in-r.html) - [How to use order in R](/base-r/how-to-use-order-in-r.html) - [How to use select() in R](/dplyr/how-to-use-select-in-r.html) - [How to use mutate() in R](/dplyr/how-to-use-mutate-in-r.html)

Introduction

Getting Started

Example 1: Basic Usage

The Problem

Step 1: Create sample data

Step 2: Apply a single function

Step 3: Apply with additional arguments

Example 2: Practical Application

The Problem

Step 1: Prepare the data

Step 2: Create a custom function

Step 3: Apply custom function

Step 4: Create logical tests

Step 5: Count categories by groups

Summary

The function returns simplified output compared to lapply(), making results easier to read and work with

Related Posts

The function returns simplified output compared to `lapply()`, making results easier to read and work with