How to use function in R

base-r
function
Master function in R programming with clear examples. Complete tutorial covering syntax, use cases, and best practices.
Published

February 22, 2026

Introduction

Functions in R are reusable blocks of code that perform specific tasks and return results. They help you avoid repeating code, make your programs more organized, and allow you to easily share functionality across different parts of your analysis.

Getting Started

library(tidyverse)

Example 1: Basic Usage

The Problem

You need to calculate the mean of several different numeric vectors, but you want to handle missing values consistently. Instead of writing the same code multiple times, you can create a custom function.

Step 1: Create a Simple Function

We’ll start by creating a function that calculates the mean while removing NA values.

# Create a function to calculate mean without NAs
clean_mean <- function(x) {
  mean(x, na.rm = TRUE)
}

This function takes one argument x and returns the mean after removing any missing values.

Step 2: Test the Function

Let’s test our function with some sample data containing missing values.

# Create test data with missing values
test_vector <- c(1, 2, NA, 4, 5, NA, 7)

# Use our custom function
result <- clean_mean(test_vector)
print(result)

The function successfully calculates the mean (3.8) while ignoring the NA values.

Step 3: Add Default Parameters

We can make our function more flexible by adding default parameters.

# Enhanced function with default parameter
clean_mean_v2 <- function(x, remove_na = TRUE) {
  mean(x, na.rm = remove_na)
}

# Test with different options
clean_mean_v2(test_vector)           # Uses default
clean_mean_v2(test_vector, FALSE)    # Keeps NAs

Now users can choose whether to remove NA values, with TRUE as the default behavior.

Example 2: Practical Application

The Problem

You’re analyzing the mtcars dataset and need to repeatedly calculate summary statistics for different groups. You want to create a function that standardizes this analysis and makes it easy to apply to different variables.

Step 1: Create a Summary Function

We’ll build a function that calculates multiple statistics for any numeric variable.

# Load the dataset
data(mtcars)

# Create comprehensive summary function
car_summary <- function(data, variable) {
  data |>
    summarise(
      count = n(),
      mean_val = mean({{variable}}, na.rm = TRUE),
      median_val = median({{variable}}, na.rm = TRUE),
      sd_val = sd({{variable}}, na.rm = TRUE)
    )
}

This function uses curly-curly notation {{}} to handle column names properly within dplyr functions.

Step 2: Apply to Single Variable

Let’s test our function by analyzing miles per gallon across the entire dataset.

# Get summary statistics for mpg
mpg_stats <- mtcars |>
  car_summary(mpg)

print(mpg_stats)

The function returns a tibble with count, mean, median, and standard deviation for the mpg variable.

Step 3: Group Analysis

Now we’ll extend the function’s usage by combining it with grouping operations.

# Analyze mpg by number of cylinders
cyl_comparison <- mtcars |>
  group_by(cyl) |>
  car_summary(mpg)

print(cyl_comparison)

This shows how our function works seamlessly with grouped data, providing separate statistics for 4, 6, and 8 cylinder cars.

Step 4: Create Multiple Variable Function

Let’s create a more advanced function that can handle multiple variables at once.

# Function for multiple variable analysis
multi_var_summary <- function(data, ...) {
  data |>
    summarise(
      across(c(...), list(
        mean = ~mean(.x, na.rm = TRUE),
        sd = ~sd(.x, na.rm = TRUE)
      ), .names = "{.col}_{.fn}")
    )
}

This function uses across() and the ... argument to apply summary statistics to multiple columns simultaneously.

Step 5: Apply Multi-Variable Function

Finally, let’s use our advanced function to analyze multiple car characteristics.

# Analyze multiple variables by transmission type
transmission_analysis <- mtcars |>
  group_by(am) |>
  multi_var_summary(mpg, hp, wt)

print(transmission_analysis)

The function efficiently calculates mean and standard deviation for mpg, horsepower, and weight, grouped by transmission type.

Summary

  • Functions eliminate code repetition and make your analysis more maintainable and readable
  • Use the function() keyword followed by arguments in parentheses and code in curly braces
  • Default parameters make functions more flexible while maintaining ease of use
  • The curly-curly {{}} notation allows functions to work properly with dplyr and column names
  • Functions can be combined with grouping operations and other tidyverse functions for powerful data analysis workflows