slice_max: get rows with highest values of a column

dplyr
dplyr slice_max()
Learn slice_max: get rows with highest values of a column with this comprehensive R tutorial. Includes practical examples and code snippets.
Published

October 2, 2023

Introduction

The slice_max() function from dplyr allows you to extract rows with the highest values from a specific column. This is incredibly useful when you need to identify top performers, find maximum values by group, or filter your dataset to show only the most extreme observations.

Getting Started

library(tidyverse)
library(palmerpenguins)

Example 1: Basic Usage

The Problem

We want to find the penguin with the longest bill length in our dataset. This will help us understand the maximum bill length observed across all penguins.

Step 1: Examine the data structure

First, let’s look at our penguin dataset to understand what we’re working with.

penguins |>
  select(species, bill_length_mm, bill_depth_mm) |>
  head()

This shows us the first few rows with species and bill measurements.

Step 2: Find the single penguin with longest bill

Now we’ll use slice_max() to get the row with the highest bill length value.

penguins |>
  slice_max(bill_length_mm, n = 1)

This returns one row containing the penguin with the maximum bill length of 59.6mm.

Step 3: Get multiple top records

We can also retrieve the top 3 penguins with the longest bills.

penguins |>
  slice_max(bill_length_mm, n = 3) |>
  select(species, bill_length_mm, bill_depth_mm)

This gives us the three penguins with bill lengths of 59.6mm, 58.5mm, and 55.9mm respectively.

Example 2: Practical Application

The Problem

A marine biologist wants to identify the heaviest penguin from each species to study potential differences in maximum body size across species. This analysis will help understand species-specific size variations and potential ecological adaptations.

Step 1: Group data by species

We need to group our data by species before finding the maximum values.

penguins |>
  group_by(species) |>
  slice_max(body_mass_g, n = 1) |>
  select(species, body_mass_g, island, sex)

This shows us the heaviest penguin from each species: Adelie (4775g), Chinstrap (4800g), and Gentoo (6300g).

Step 2: Handle ties with proportion

Sometimes we want a percentage of top values rather than a fixed number.

penguins |>
  group_by(species) |>
  slice_max(body_mass_g, prop = 0.1) |>
  count(species, name = "top_10_percent_count")

This gives us the top 10% heaviest penguins from each species and counts how many that represents.

Step 3: Find top performers across multiple variables

Let’s identify penguins that are both heavy and have long flippers.

penguins |>
  mutate(size_score = body_mass_g + flipper_length_mm * 10) |>
  slice_max(size_score, n = 5) |>
  select(species, body_mass_g, flipper_length_mm, size_score)

This creates a combined size score and shows the 5 penguins with the highest combined measurements.

Step 4: Compare with traditional filtering

Here’s how slice_max() differs from traditional filtering approaches.

# Traditional approach - more verbose
max_bill <- max(penguins$bill_length_mm, na.rm = TRUE)
penguins |>
  filter(bill_length_mm == max_bill)

# slice_max approach - cleaner
penguins |>
  slice_max(bill_length_mm, n = 1)

The slice_max() approach is more concise and handles missing values automatically.

Summary

  • slice_max() efficiently extracts rows with the highest values from a specified column
  • Use n = parameter to specify exact number of rows, or prop = for percentage-based selection
  • Combine with group_by() to find maximum values within each group
  • The function automatically handles missing values and is more concise than traditional filtering methods
  • Perfect for identifying top performers, outliers, or maximum values in your analysis