How to use function in R
Introduction
Functions in R are reusable blocks of code that perform specific tasks and return results. They help you avoid repeating code, make your programs more organized, and allow you to easily share functionality across different parts of your analysis.
Getting Started
library(tidyverse)Example 1: Basic Usage
The Problem
You need to calculate the mean of several different numeric vectors, but you want to handle missing values consistently. Instead of writing the same code multiple times, you can create a custom function.
Step 1: Create a Simple Function
We’ll start by creating a function that calculates the mean while removing NA values.
# Create a function to calculate mean without NAs
clean_mean <- function(x) {
mean(x, na.rm = TRUE)
}This function takes one argument x and returns the mean after removing any missing values.
Step 2: Test the Function
Let’s test our function with some sample data containing missing values.
# Create test data with missing values
test_vector <- c(1, 2, NA, 4, 5, NA, 7)
# Use our custom function
result <- clean_mean(test_vector)
print(result)The function successfully calculates the mean (3.8) while ignoring the NA values.
Step 3: Add Default Parameters
We can make our function more flexible by adding default parameters.
# Enhanced function with default parameter
clean_mean_v2 <- function(x, remove_na = TRUE) {
mean(x, na.rm = remove_na)
}
# Test with different options
clean_mean_v2(test_vector) # Uses default
clean_mean_v2(test_vector, FALSE) # Keeps NAsNow users can choose whether to remove NA values, with TRUE as the default behavior.
Example 2: Practical Application
The Problem
You’re analyzing the mtcars dataset and need to repeatedly calculate summary statistics for different groups. You want to create a function that standardizes this analysis and makes it easy to apply to different variables.
Step 1: Create a Summary Function
We’ll build a function that calculates multiple statistics for any numeric variable.
# Load the dataset
data(mtcars)
# Create comprehensive summary function
car_summary <- function(data, variable) {
data |>
summarise(
count = n(),
mean_val = mean({{variable}}, na.rm = TRUE),
median_val = median({{variable}}, na.rm = TRUE),
sd_val = sd({{variable}}, na.rm = TRUE)
)
}This function uses curly-curly notation {{}} to handle column names properly within dplyr functions.
Step 2: Apply to Single Variable
Let’s test our function by analyzing miles per gallon across the entire dataset.
# Get summary statistics for mpg
mpg_stats <- mtcars |>
car_summary(mpg)
print(mpg_stats)The function returns a tibble with count, mean, median, and standard deviation for the mpg variable.
Step 3: Group Analysis
Now we’ll extend the function’s usage by combining it with grouping operations.
# Analyze mpg by number of cylinders
cyl_comparison <- mtcars |>
group_by(cyl) |>
car_summary(mpg)
print(cyl_comparison)This shows how our function works seamlessly with grouped data, providing separate statistics for 4, 6, and 8 cylinder cars.
Step 4: Create Multiple Variable Function
Let’s create a more advanced function that can handle multiple variables at once.
# Function for multiple variable analysis
multi_var_summary <- function(data, ...) {
data |>
summarise(
across(c(...), list(
mean = ~mean(.x, na.rm = TRUE),
sd = ~sd(.x, na.rm = TRUE)
), .names = "{.col}_{.fn}")
)
}This function uses across() and the ... argument to apply summary statistics to multiple columns simultaneously.
Step 5: Apply Multi-Variable Function
Finally, let’s use our advanced function to analyze multiple car characteristics.
# Analyze multiple variables by transmission type
transmission_analysis <- mtcars |>
group_by(am) |>
multi_var_summary(mpg, hp, wt)
print(transmission_analysis)The function efficiently calculates mean and standard deviation for mpg, horsepower, and weight, grouped by transmission type.
Summary
- Functions eliminate code repetition and make your analysis more maintainable and readable
- Use the
function()keyword followed by arguments in parentheses and code in curly braces - Default parameters make functions more flexible while maintaining ease of use
- The curly-curly
{{}}notation allows functions to work properly with dplyr and column names Functions can be combined with grouping operations and other tidyverse functions for powerful data analysis workflows