How to use sapply in R

base-r
sapply
Master sapply in R programming with clear examples. Complete tutorial covering syntax, use cases, and best practices.
Published

February 21, 2026

Introduction

The sapply() function in R applies a function to each element of a list or vector and returns a simplified result, typically a vector or matrix. It’s particularly useful when you need to perform the same operation across multiple data elements and want cleaner output than lapply() provides.

Getting Started

library(tidyverse)
data(mtcars)
data(penguins, package = "palmerpenguins")

Example 1: Basic Usage

The Problem

We need to calculate summary statistics for multiple numeric columns in the mtcars dataset. Instead of writing separate functions for each column, we want to apply the same function efficiently across all columns.

Step 1: Create sample data

First, let’s select a few numeric columns to work with.

# Select key numeric variables
car_data <- mtcars |> 
  select(mpg, hp, wt, qsec)

head(car_data, 3)

This gives us a clean subset with four numeric variables to analyze.

Step 2: Apply a single function

Now we’ll use sapply() to calculate the mean of each column.

# Calculate mean for each column
column_means <- sapply(car_data, mean)
print(column_means)

The sapply() function applied the mean() function to each column and returned a named vector with the results.

Step 3: Apply with additional arguments

We can pass additional arguments to our function through sapply().

# Add some NA values for demonstration
car_data_na <- car_data
car_data_na[1, "mpg"] <- NA

# Calculate means ignoring NA values
means_no_na <- sapply(car_data_na, mean, na.rm = TRUE)
print(means_no_na)

The na.rm = TRUE argument was passed to each mean() function call, handling missing values properly.

Example 2: Practical Application

The Problem

We’re analyzing the penguins dataset and need to identify which numeric measurements have outliers and calculate multiple summary statistics. We want to create a comprehensive overview of data quality and distribution for each numeric variable.

Step 1: Prepare the data

Let’s extract numeric columns and remove any missing values for clean analysis.

# Get numeric columns from penguins
penguin_numeric <- penguins |> 
  select(bill_length_mm, bill_depth_mm, 
         flipper_length_mm, body_mass_g) |> 
  na.omit()

dim(penguin_numeric)

This gives us a clean dataset with four numeric measurements for analysis.

Step 2: Create a custom function

We’ll build a function that returns multiple statistics for outlier detection.

# Function to calculate summary stats
get_stats <- function(x) {
  c(mean = mean(x),
    median = median(x),
    sd = sd(x),
    iqr = IQR(x))
}

This function returns four key statistics that help us understand each variable’s distribution.

Step 3: Apply custom function

Now we’ll use sapply() to apply our custom function across all columns.

# Apply custom function to all columns
penguin_stats <- sapply(penguin_numeric, get_stats)
print(round(penguin_stats, 2))

The result is a matrix where each column represents a variable and each row represents a different statistic.

Step 4: Create logical tests

We can also use sapply() for logical operations across columns.

# Check which variables have high variability (CV > 0.15)
high_variation <- sapply(penguin_numeric, function(x) {
  coefficient_variation <- sd(x) / mean(x)
  coefficient_variation > 0.15
})

print(high_variation)

This returns a logical vector showing which measurements have high relative variability.

Step 5: Count categories by groups

Using sapply() with factors to count occurrences across different groupings.

# Count species occurrences
species_counts <- sapply(split(penguins$species, penguins$island), 
                        function(x) table(x))

print(species_counts)

This creates a breakdown of species counts by island, demonstrating sapply() with more complex data structures.

Summary

  • sapply() applies functions across list or vector elements and simplifies results into vectors or matrices
  • It’s ideal for calculating summary statistics across multiple columns efficiently
  • You can pass additional arguments to functions using extra parameters in sapply()
  • Custom functions work seamlessly with sapply() for complex operations
  • The function returns simplified output compared to lapply(), making results easier to read and work with