slice_min: Get Rows with minimum values of a column

dplyr slice_min()
Learn slice_min: get rows with minimum values of a column with this comprehensive R tutorial. Includes practical examples and code snippets.
Published

November 3, 2023

Introduction

The slice_min() function from the dplyr package is a powerful tool for extracting rows that contain the minimum values of a specified column. Unlike simply finding the minimum value itself, slice_min() returns the complete rows where these minimum values occur, making it invaluable for data exploration and analysis.

This function is particularly useful when you need to identify records with the lowest values - such as finding the cheapest products, earliest dates, smallest measurements, or poorest performance metrics. It’s especially handy when multiple rows share the same minimum value, as slice_min() will return all of them by default, giving you a complete picture of your data’s minimum cases.

Getting Started

First, let’s load the required packages for our examples:

library(tidyverse)
library(palmerpenguins)

Example 1: Basic Usage

Let’s start with a simple example using the built-in mtcars dataset to find the cars with the lowest fuel consumption (mpg):

# Find cars with the minimum miles per gallon
mtcars |>
  slice_min(mpg)

You can also specify how many minimum rows you want to return using the n parameter:

# Get the 3 cars with lowest mpg
mtcars |>
  slice_min(mpg, n = 3)

If you want to get a certain proportion of rows instead of a fixed number, use the prop parameter:

# Get the bottom 10% of cars by mpg
mtcars |>
  slice_min(mpg, prop = 0.1)

Example 2: Practical Application

Now let’s explore a more complex real-world scenario using the Palmer Penguins dataset. Suppose we want to analyze the smallest penguins by body mass within each species:

# Find the penguin with minimum body mass in each species
penguins |>
  filter(!is.na(body_mass_g)) |>
  group_by(species) |>
  slice_min(body_mass_g) |>
  select(species, island, sex, body_mass_g, flipper_length_mm)

We can extend this analysis to find multiple minimum values and include additional context:

# Find the 2 lightest penguins per species and calculate some summary stats
penguins |>
  filter(!is.na(body_mass_g), !is.na(flipper_length_mm)) |>
  group_by(species) |>
  slice_min(body_mass_g, n = 2) |>
  mutate(
    mass_flipper_ratio = body_mass_g / flipper_length_mm,
    rank = row_number()
  ) |>
  select(species, island, sex, body_mass_g, flipper_length_mm, mass_flipper_ratio, rank) |>
  arrange(species, rank)

Here’s another practical example that combines slice_min() with other dplyr functions to create a comprehensive analysis:

# Find penguins with minimum bill length by species and sex combination
penguins |>
  filter(!is.na(bill_length_mm), !is.na(sex)) |>
  group_by(species, sex) |>
  slice_min(bill_length_mm) |>
  ungroup() |>
  mutate(species_sex = paste(species, sex, sep = "_")) |>
  select(species_sex, island, bill_length_mm, bill_depth_mm, body_mass_g) |>
  arrange(bill_length_mm)

Advanced Options

The slice_min() function also handles ties gracefully. By default, it keeps all tied values, but you can control this behavior:

# Handle ties by keeping all (default behavior)
mtcars |>
  slice_min(cyl, n = 1)

# Handle ties by keeping only the first occurrence
mtcars |>
  slice_min(cyl, n = 1, with_ties = FALSE)

Summary

The slice_min() function is an essential tool for identifying rows with minimum values in your datasets. Key takeaways include:

  • Use slice_min(column) to get all rows with the minimum value of a column
  • Control the number of returned rows with n parameter or proportion with prop
  • Combine with group_by() to find minimums within groups
  • Handle ties using the with_ties parameter
  • Always consider filtering out missing values with filter(!is.na()) before applying slice_min()

This function streamlines the process of finding bottom performers, smallest values, or earliest records in your data analysis workflow, making it more efficient than manually filtering and sorting operations.