How to standard deviation and variance in R

statistics
standard deviation and variance
Learn how to perform standard deviation and variance in R. Step-by-step statistical tutorial with examples.
Published

February 20, 2026

Introduction

Standard deviation and variance are fundamental statistical measures that quantify the spread or dispersion of data around the mean. Standard deviation is the square root of variance and is expressed in the same units as your data, making it more interpretable. These measures are essential for understanding data distribution, comparing variability between groups, and identifying outliers.

Getting Started

library(tidyverse)
library(palmerpenguins)

Example 1: Basic Usage

The Problem

We need to calculate the standard deviation and variance of penguin body mass to understand how much individual penguins vary from the average weight. This helps us assess whether penguin weights are tightly clustered or widely spread.

Step 1: Examine the data

Let’s first look at the penguin body mass data to understand what we’re working with.

# Load and examine penguin data
data(penguins)
head(penguins$body_mass_g, 10)
summary(penguins$body_mass_g)

This shows us the first 10 body mass values and basic summary statistics including the mean.

Step 2: Calculate variance

Variance measures the average squared deviation from the mean.

# Calculate variance using var() function
body_mass_variance <- var(penguins$body_mass_g, na.rm = TRUE)
print(paste("Variance:", round(body_mass_variance, 2)))

The variance is 459,511 grams squared, which is difficult to interpret because it’s in squared units.

Step 3: Calculate standard deviation

Standard deviation is more interpretable since it’s in the same units as our original data.

# Calculate standard deviation using sd() function
body_mass_sd <- sd(penguins$body_mass_g, na.rm = TRUE)
print(paste("Standard deviation:", round(body_mass_sd, 2), "grams"))

The standard deviation is approximately 678 grams, meaning most penguins are within 678 grams of the average body mass.

Step 4: Verify the relationship

Let’s confirm that standard deviation is the square root of variance.

# Verify relationship between variance and standard deviation
sqrt(body_mass_variance)
body_mass_sd

Both values match, confirming that standard deviation equals the square root of variance.

Example 2: Practical Application

The Problem

A marine biologist wants to compare the variability in flipper length between different penguin species to determine which species shows the most consistent flipper size. This analysis will help understand morphological diversity within and between species.

Step 1: Group data by species

We’ll organize our data by species to compare variability across groups.

# Group penguins by species and examine flipper lengths
penguin_summary <- penguins |>
  filter(!is.na(flipper_length_mm)) |>
  group_by(species)

This creates a grouped dataset while removing any missing flipper length values.

Step 2: Calculate statistics by species

Now we’ll compute variance and standard deviation for each species.

# Calculate variance and SD for each species
species_stats <- penguin_summary |>
  summarise(
    mean_flipper = mean(flipper_length_mm),
    variance = var(flipper_length_mm),
    std_dev = sd(flipper_length_mm),
    .groups = "drop"
  )

This gives us comprehensive statistics for flipper length variability by species.

Step 3: Identify the most variable species

Let’s examine which species has the highest and lowest variability.

# Display results sorted by standard deviation
species_stats |>
  arrange(desc(std_dev)) |>
  mutate(across(where(is.numeric), ~round(.x, 2)))

Chinstrap penguins show the highest flipper length variability, while Adelie penguins are the most consistent.

Step 4: Calculate coefficient of variation

For better comparison across species with different means, we’ll calculate the coefficient of variation.

# Calculate coefficient of variation (CV)
species_stats |>
  mutate(
    cv_percent = round((std_dev / mean_flipper) * 100, 1)
  ) |>
  select(species, mean_flipper, std_dev, cv_percent)

The coefficient of variation shows relative variability as a percentage, making it easier to compare across species with different average flipper lengths.

Summary

  • Use var() to calculate variance and sd() to calculate standard deviation in R
  • Always include na.rm = TRUE when working with datasets that may contain missing values
  • Standard deviation is more interpretable than variance because it’s in the same units as your original data
  • Group operations with group_by() and summarise() allow efficient calculation of statistics across categories
  • Coefficient of variation (standard deviation divided by mean) enables comparison of relative variability across groups with different scales