How to use apply in R
Introduction
The apply() function is one of R’s most powerful tools for applying functions across rows or columns of matrices and data frames. It’s perfect when you need to perform the same operation on multiple rows or columns without writing loops, making your code cleaner and often faster.
Getting Started
library(tidyverse)
data(mtcars)Example 1: Basic Usage
The Problem
We want to calculate summary statistics for multiple columns in the mtcars dataset. Instead of calculating each column individually, we can use apply() to do this efficiently.
Step 1: Create a sample matrix
First, let’s select some numeric columns to work with.
# Select numeric columns for analysis
car_matrix <- as.matrix(mtcars[, c("mpg", "hp", "wt", "qsec")])
head(car_matrix, 3)This creates a matrix with four key car characteristics that we can analyze.
Step 2: Apply function across columns
Now we’ll calculate the mean for each column using apply().
# Calculate column means using apply
# Syntax: apply(data, margin, function)
# margin = 2 means columns, margin = 1 means rows
column_means <- apply(car_matrix, 2, mean)
column_meansThe apply() function calculated the mean of each column in one simple command, returning a named vector with results.
Step 3: Apply function across rows
We can also calculate statistics for each row (each car).
# Calculate row means (average across variables for each car)
row_means <- apply(car_matrix, 1, mean)
head(row_means)This gives us the average value across all four variables for each car in our dataset.
Example 2: Practical Application
The Problem
We’re analyzing car performance and want to identify which cars perform above average in multiple categories. We need to standardize values and count how many categories each car excels in.
Step 1: Standardize the data
We’ll convert each value to a z-score to make variables comparable.
# Standardize each column (subtract mean, divide by sd)
standardized_cars <- apply(car_matrix, 2, function(x) {
(x - mean(x)) / sd(x)
})
head(standardized_cars, 3)Each column now has a mean of 0 and standard deviation of 1, making comparisons meaningful.
Step 2: Count above-average performance
Now we’ll count how many categories each car performs above average in.
# Count positive z-scores for each car (row)
above_average_count <- apply(standardized_cars, 1, function(x) {
sum(x > 0)
})
head(above_average_count)This tells us how many of the four categories each car scores above the overall average.
Step 3: Find top performers
Let’s identify the cars that excel in the most categories.
# Find cars excelling in 3 or more categories
top_performers <- names(above_average_count[above_average_count >= 3])
cat("Top performing cars:", paste(top_performers, collapse = ", "))We’ve identified the cars that perform above average in at least three of our four key metrics.
Step 4: Apply custom functions
We can use apply() with custom functions for more complex analyses.
# Calculate range (max - min) for each column
value_ranges <- apply(car_matrix, 2, function(x) {
max(x) - min(x)
})
value_rangesThis shows us the spread of values in each variable, helping identify which characteristics vary most across cars.
Summary
apply(data, 2, function)applies functions across columns, whileapply(data, 1, function)works across rows- Use built-in functions like
mean,sum,sd, or create custom functions for specific analyses
- The function returns a vector or array with results for each row or column processed
apply()is memory efficient and often faster than loops for matrix operationsPerfect for data standardization, summary statistics, and element-wise transformations across datasets