How to sort a dataframe by multiple columns in R

how-to

sorting

Learn sort a dataframe by multiple columns in r with clear examples and explanations.

Published

March 26, 2026

Introduction

The dplyr package provides powerful tools for data manipulation and transformation in R. The mutate() function is one of the most essential functions for creating new variables or modifying existing ones in your dataset. You’ll use mutate() whenever you need to add calculated columns, transform values, or derive new insights from your existing data.

Getting Started

Let’s load the necessary packages and data for our examples:

library(tidyverse)
library(palmerpenguins)

We’ll use the Palmer Penguins dataset, which contains measurements of different penguin species. This gives us realistic data to work with for our examples.

Basic Column Creation

The simplest use of mutate() is creating a new column with a calculated value. Let’s create a new column that converts body mass from grams to kilograms:

penguins |>
  mutate(body_mass_kg = body_mass_g / 1000) |>
  select(species, body_mass_g, body_mass_kg)

This creates a new column called body_mass_kg by dividing the existing body_mass_g column by 1000. The select() function helps us view just the relevant columns to see our transformation.

Creating Multiple Columns

You can create several new columns in a single mutate() call. Here we’ll add both a mass conversion and a bill ratio calculation:

penguins |>
  mutate(
    body_mass_kg = body_mass_g / 1000,
    bill_ratio = bill_length_mm / bill_depth_mm
  ) |>
  select(species, body_mass_kg, bill_ratio)

Each new column is separated by a comma, and you can reference newly created columns in the same mutate() call. This approach is more efficient than chaining multiple mutate() functions.

Using Conditional Logic

The case_when() function works perfectly with mutate() to create categorical variables based on conditions. Let’s classify penguins by their body mass:

penguins |>
  mutate(
    size_category = case_when(
      body_mass_g < 3500 ~ "Small",
      body_mass_g < 4500 ~ "Medium",
      body_mass_g >= 4500 ~ "Large"
    )
  ) |>
  select(species, body_mass_g, size_category)

The case_when() function evaluates conditions in order and assigns the first matching result. This creates clear, readable conditional logic for creating new categorical variables.

Working with Character Data

mutate() works great with text transformations using functions from the stringr package. Let’s clean up and modify the species names:

penguins |>
  mutate(
    species_upper = str_to_upper(species),
    species_short = str_extract(species, "^[A-Za-z]+")
  ) |>
  select(species, species_upper, species_short)

This example converts species names to uppercase and extracts just the first word. String manipulation functions give you powerful tools for cleaning and standardizing text data.

Handling Missing Values

When working with real data, you’ll often need to handle missing values during calculations. Let’s create a column that handles NA values gracefully:

penguins |>
  mutate(
    bill_length_clean = if_else(
      is.na(bill_length_mm), 
      mean(bill_length_mm, na.rm = TRUE), 
      bill_length_mm
    )
  ) |>
  select(bill_length_mm, bill_length_clean)

The if_else() function replaces missing bill length values with the overall mean. This approach ensures your calculations don’t fail due to missing data while maintaining data integrity.

Group-wise Calculations

Combining mutate() with group_by() lets you create variables based on group-specific calculations. Here we’ll calculate how each penguin compares to their species average:

penguins |>
  group_by(species) |>
  mutate(
    species_avg_mass = mean(body_mass_g, na.rm = TRUE),
    mass_vs_species_avg = body_mass_g - species_avg_mass
  ) |>
  select(species, body_mass_g, mass_vs_species_avg) |>
  ungroup()

This creates species-specific averages and shows how each individual penguin differs from their species norm. Remember to use ungroup() after group operations to avoid unintended grouping in subsequent operations.

Advanced Transformations

For more complex transformations, you can use functions from other packages or create custom calculations. Let’s standardize measurements using z-scores:

penguins |>
  mutate(
    bill_length_z = scale(bill_length_mm)[,1],
    body_mass_z = scale(body_mass_g)[,1]
  ) |>
  select(species, bill_length_z, body_mass_z)

The scale() function standardizes variables to have a mean of 0 and standard deviation of 1. This is particularly useful when comparing variables with different units or scales.

Summary

The mutate() function is your primary tool for creating and transforming columns in R. Whether you’re doing simple calculations, complex conditional logic, or group-specific transformations, mutate() provides a consistent and readable syntax. Key takeaways include: always handle missing values appropriately, leverage conditional functions like case_when() and if_else(), and remember that you can create multiple columns in a single mutate() call for efficiency.

--- title: "How to sort a dataframe by multiple columns in R" description: "Learn sort a dataframe by multiple columns in r with clear examples and explanations." date: 2026-03-26 categories: ['how-to', 'sorting'] format: html: code-fold: false code-tools: true --- ## Introduction The `dplyr` package provides powerful tools for data manipulation and transformation in R. The `mutate()` function is one of the most essential functions for creating new variables or modifying existing ones in your dataset. You'll use `mutate()` whenever you need to add calculated columns, transform values, or derive new insights from your existing data. ## Getting Started Let's load the necessary packages and data for our examples: ```r library(tidyverse) library(palmerpenguins) ``` We'll use the Palmer Penguins dataset, which contains measurements of different penguin species. This gives us realistic data to work with for our examples. ## Basic Column Creation The simplest use of `mutate()` is creating a new column with a calculated value. Let's create a new column that converts body mass from grams to kilograms: ```r penguins |> mutate(body_mass_kg = body_mass_g / 1000) |> select(species, body_mass_g, body_mass_kg) ``` This creates a new column called `body_mass_kg` by dividing the existing `body_mass_g` column by 1000. The `select()` function helps us view just the relevant columns to see our transformation. ## Creating Multiple Columns You can create several new columns in a single `mutate()` call. Here we'll add both a mass conversion and a bill ratio calculation: ```r penguins |> mutate( body_mass_kg = body_mass_g / 1000, bill_ratio = bill_length_mm / bill_depth_mm ) |> select(species, body_mass_kg, bill_ratio) ``` Each new column is separated by a comma, and you can reference newly created columns in the same `mutate()` call. This approach is more efficient than chaining multiple `mutate()` functions. ## Using Conditional Logic The `case_when()` function works perfectly with `mutate()` to create categorical variables based on conditions. Let's classify penguins by their body mass: ```r penguins |> mutate( size_category = case_when( body_mass_g < 3500 ~ "Small", body_mass_g < 4500 ~ "Medium", body_mass_g >= 4500 ~ "Large" ) ) |> select(species, body_mass_g, size_category) ``` The `case_when()` function evaluates conditions in order and assigns the first matching result. This creates clear, readable conditional logic for creating new categorical variables. ## Working with Character Data `mutate()` works great with text transformations using functions from the `stringr` package. Let's clean up and modify the species names: ```r penguins |> mutate( species_upper = str_to_upper(species), species_short = str_extract(species, "^[A-Za-z]+") ) |> select(species, species_upper, species_short) ``` This example converts species names to uppercase and extracts just the first word. String manipulation functions give you powerful tools for cleaning and standardizing text data. ## Handling Missing Values When working with real data, you'll often need to handle missing values during calculations. Let's create a column that handles NA values gracefully: ```r penguins |> mutate( bill_length_clean = if_else( is.na(bill_length_mm), mean(bill_length_mm, na.rm = TRUE), bill_length_mm ) ) |> select(bill_length_mm, bill_length_clean) ``` The `if_else()` function replaces missing bill length values with the overall mean. This approach ensures your calculations don't fail due to missing data while maintaining data integrity. ## Group-wise Calculations Combining `mutate()` with `group_by()` lets you create variables based on group-specific calculations. Here we'll calculate how each penguin compares to their species average: ```r penguins |> group_by(species) |> mutate( species_avg_mass = mean(body_mass_g, na.rm = TRUE), mass_vs_species_avg = body_mass_g - species_avg_mass ) |> select(species, body_mass_g, mass_vs_species_avg) |> ungroup() ``` This creates species-specific averages and shows how each individual penguin differs from their species norm. Remember to use `ungroup()` after group operations to avoid unintended grouping in subsequent operations. ## Advanced Transformations For more complex transformations, you can use functions from other packages or create custom calculations. Let's standardize measurements using z-scores: ```r penguins |> mutate( bill_length_z = scale(bill_length_mm)[,1], body_mass_z = scale(body_mass_g)[,1] ) |> select(species, bill_length_z, body_mass_z) ``` The `scale()` function standardizes variables to have a mean of 0 and standard deviation of 1. This is particularly useful when comparing variables with different units or scales. ## Summary The `mutate()` function is your primary tool for creating and transforming columns in R. Whether you're doing simple calculations, complex conditional logic, or group-specific transformations, `mutate()` provides a consistent and readable syntax. Key takeaways include: always handle missing values appropriately, leverage conditional functions like `case_when()` and `if_else()`, and remember that you can create multiple columns in a single `mutate()` call for efficiency.