How to use function in R

base-r

function

Master function in R programming with clear examples. Complete tutorial covering syntax, use cases, and best practices.

Published

February 22, 2026

Introduction

Functions in R are reusable blocks of code that perform specific tasks and return results. They help you avoid repeating code, make your programs more organized, and allow you to easily share functionality across different parts of your analysis.

Getting Started

library(tidyverse)

Example 1: Basic Usage

The Problem

You need to calculate the mean of several different numeric vectors, but you want to handle missing values consistently. Instead of writing the same code multiple times, you can create a custom function.

Step 1: Create a Simple Function

We’ll start by creating a function that calculates the mean while removing NA values.

# Create a function to calculate mean without NAs
clean_mean <- function(x) {
  mean(x, na.rm = TRUE)
}

This function takes one argument x and returns the mean after removing any missing values.

Step 2: Test the Function

Let’s test our function with some sample data containing missing values.

# Create test data with missing values
test_vector <- c(1, 2, NA, 4, 5, NA, 7)

# Use our custom function
result <- clean_mean(test_vector)
print(result)

The function successfully calculates the mean (3.8) while ignoring the NA values.

Step 3: Add Default Parameters

We can make our function more flexible by adding default parameters.

# Enhanced function with default parameter
clean_mean_v2 <- function(x, remove_na = TRUE) {
  mean(x, na.rm = remove_na)
}

# Test with different options
clean_mean_v2(test_vector)           # Uses default
clean_mean_v2(test_vector, FALSE)    # Keeps NAs

Now users can choose whether to remove NA values, with TRUE as the default behavior.

Example 2: Practical Application

The Problem

You’re analyzing the mtcars dataset and need to repeatedly calculate summary statistics for different groups. You want to create a function that standardizes this analysis and makes it easy to apply to different variables.

Step 1: Create a Summary Function

We’ll build a function that calculates multiple statistics for any numeric variable.

# Load the dataset
data(mtcars)

# Create comprehensive summary function
car_summary <- function(data, variable) {
  data |>
    summarise(
      count = n(),
      mean_val = mean({{variable}}, na.rm = TRUE),
      median_val = median({{variable}}, na.rm = TRUE),
      sd_val = sd({{variable}}, na.rm = TRUE)
    )
}

This function uses curly-curly notation {{}} to handle column names properly within dplyr functions.

Step 2: Apply to Single Variable

Let’s test our function by analyzing miles per gallon across the entire dataset.

# Get summary statistics for mpg
mpg_stats <- mtcars |>
  car_summary(mpg)

print(mpg_stats)

The function returns a tibble with count, mean, median, and standard deviation for the mpg variable.

Step 3: Group Analysis

Now we’ll extend the function’s usage by combining it with grouping operations.

# Analyze mpg by number of cylinders
cyl_comparison <- mtcars |>
  group_by(cyl) |>
  car_summary(mpg)

print(cyl_comparison)

This shows how our function works seamlessly with grouped data, providing separate statistics for 4, 6, and 8 cylinder cars.

Step 4: Create Multiple Variable Function

Let’s create a more advanced function that can handle multiple variables at once.

# Function for multiple variable analysis
multi_var_summary <- function(data, ...) {
  data |>
    summarise(
      across(c(...), list(
        mean = ~mean(.x, na.rm = TRUE),
        sd = ~sd(.x, na.rm = TRUE)
      ), .names = "{.col}_{.fn}")
    )
}

This function uses across() and the ... argument to apply summary statistics to multiple columns simultaneously.

Step 5: Apply Multi-Variable Function

Finally, let’s use our advanced function to analyze multiple car characteristics.

# Analyze multiple variables by transmission type
transmission_analysis <- mtcars |>
  group_by(am) |>
  multi_var_summary(mpg, hp, wt)

print(transmission_analysis)

The function efficiently calculates mean and standard deviation for mpg, horsepower, and weight, grouped by transmission type.

Summary

Functions eliminate code repetition and make your analysis more maintainable and readable
Use the function() keyword followed by arguments in parentheses and code in curly braces
Default parameters make functions more flexible while maintaining ease of use
The curly-curly {{}} notation allows functions to work properly with dplyr and column names
Functions can be combined with grouping operations and other tidyverse functions for powerful data analysis workflows

--- title: "How to use function in R" description: "Master function in R programming with clear examples. Complete tutorial covering syntax, use cases, and best practices." date: 2026-02-22 categories: ['base-r', 'function'] format: html: code-fold: false code-tools: true --- ## Introduction Functions in R are reusable blocks of code that perform specific tasks and return results. They help you avoid repeating code, make your programs more organized, and allow you to easily share functionality across different parts of your analysis. ## Getting Started ```r library(tidyverse) ``` ## Example 1: Basic Usage ### The Problem You need to calculate the mean of several different numeric vectors, but you want to handle missing values consistently. Instead of writing the same code multiple times, you can create a custom function. ### Step 1: Create a Simple Function We'll start by creating a function that calculates the mean while removing NA values. ```r # Create a function to calculate mean without NAs clean_mean <- function(x) { mean(x, na.rm = TRUE) } ``` This function takes one argument `x` and returns the mean after removing any missing values. ### Step 2: Test the Function Let's test our function with some sample data containing missing values. ```r # Create test data with missing values test_vector <- c(1, 2, NA, 4, 5, NA, 7) # Use our custom function result <- clean_mean(test_vector) print(result) ``` The function successfully calculates the mean (3.8) while ignoring the NA values. ### Step 3: Add Default Parameters We can make our function more flexible by adding default parameters. ```r # Enhanced function with default parameter clean_mean_v2 <- function(x, remove_na = TRUE) { mean(x, na.rm = remove_na) } # Test with different options clean_mean_v2(test_vector) # Uses default clean_mean_v2(test_vector, FALSE) # Keeps NAs ``` Now users can choose whether to remove NA values, with TRUE as the default behavior. ## Example 2: Practical Application ### The Problem You're analyzing the mtcars dataset and need to repeatedly calculate summary statistics for different groups. You want to create a function that standardizes this analysis and makes it easy to apply to different variables. ### Step 1: Create a Summary Function We'll build a function that calculates multiple statistics for any numeric variable. ```r # Load the dataset data(mtcars) # Create comprehensive summary function car_summary <- function(data, variable) { data |> summarise( count = n(), mean_val = mean({{variable}}, na.rm = TRUE), median_val = median({{variable}}, na.rm = TRUE), sd_val = sd({{variable}}, na.rm = TRUE) ) } ``` This function uses curly-curly notation `{{}}` to handle column names properly within dplyr functions. ### Step 2: Apply to Single Variable Let's test our function by analyzing miles per gallon across the entire dataset. ```r # Get summary statistics for mpg mpg_stats <- mtcars |> car_summary(mpg) print(mpg_stats) ``` The function returns a tibble with count, mean, median, and standard deviation for the mpg variable. ### Step 3: Group Analysis Now we'll extend the function's usage by combining it with grouping operations. ```r # Analyze mpg by number of cylinders cyl_comparison <- mtcars |> group_by(cyl) |> car_summary(mpg) print(cyl_comparison) ``` This shows how our function works seamlessly with grouped data, providing separate statistics for 4, 6, and 8 cylinder cars. ### Step 4: Create Multiple Variable Function Let's create a more advanced function that can handle multiple variables at once. ```r # Function for multiple variable analysis multi_var_summary <- function(data, ...) { data |> summarise( across(c(...), list( mean = ~mean(.x, na.rm = TRUE), sd = ~sd(.x, na.rm = TRUE) ), .names = "{.col}_{.fn}") ) } ``` This function uses [`across()`](/dplyr/how-to-use-across-in-r.html) and the `...` argument to apply summary statistics to multiple columns simultaneously. ### Step 5: Apply Multi-Variable Function Finally, let's use our advanced function to analyze multiple car characteristics. ```r # Analyze multiple variables by transmission type transmission_analysis <- mtcars |> group_by(am) |> multi_var_summary(mpg, hp, wt) print(transmission_analysis) ``` The function efficiently calculates mean and standard deviation for mpg, horsepower, and weight, grouped by transmission type. ## Summary - Functions eliminate code repetition and make your analysis more maintainable and readable - Use the `function()` keyword followed by arguments in parentheses and code in curly braces - Default parameters make functions more flexible while maintaining ease of use - The curly-curly `{{}}` notation allows functions to work properly with dplyr and column names - Functions can be combined with grouping operations and other tidyverse functions for powerful data analysis workflows --- ## Related Posts - [How to use mapply in R](/base-r/how-to-use-mapply-in-r.html) - [How to use read.csv in R](/base-r/how-to-use-readcsv-in-r.html) - [How to use order in R](/base-r/how-to-use-order-in-r.html) - [How to use select() in R](/dplyr/how-to-use-select-in-r.html) - [How to use mutate() in R](/dplyr/how-to-use-mutate-in-r.html)