How to get top and bottom rows of each group in R

dplyr

dplyr slice

Learn get top and bottom rows of each group in r with clear examples and explanations.

Published

March 26, 2026

Introduction

The slice_max() function in dplyr is a powerful tool for selecting the top n rows with the highest values from your data. Unlike simple sorting, slice_max() lets you efficiently extract just the records you need, making it perfect for finding top performers, highest scores, or maximum values within groups. This function is especially useful when working with grouped data where you want to find the top entries for each category.

Getting Started

First, let’s load the tidyverse package and create some sample data to work with:

library(tidyverse)

We’ll create a dataset with symbols and values to demonstrate different uses of slice_max():

set.seed(2024)
df <- tibble(
  symbol = sample(letters, 10),
  value = rnorm(10, mean = 5, sd = 10)
)
df

This gives us a dataset with 10 random symbols and their corresponding numeric values, including both positive and negative numbers.

Basic Sorting vs slice_max()

Let’s first see what our data looks like when sorted by value:

df |> arrange(value)

The arrange() function sorts all rows, but what if we only want the top 3 highest values? This is where slice_max() becomes useful.

Finding Top Values

To get the 3 rows with the highest values, we use slice_max():

df |> slice_max(value, n = 3)

This returns only the top 3 rows with the highest values, which is much more efficient than sorting the entire dataset when you only need the top entries.

Working with Grouped Data

slice_max() becomes even more powerful when combined with group_by(). Let’s first create a grouping variable:

df_grouped <- df |>
  mutate(direction = ifelse(value > 0, "positive", "negative"))
df_grouped

Now we can find the top values within each group:

df_grouped |>
  group_by(direction) |>
  slice_max(value, n = 2)

This gives us the top 2 highest values for both positive and negative numbers separately.

Handling Ties

When there are tied values, slice_max() includes all tied observations by default. You can control this behavior:

# Include all ties (default)
df |> slice_max(value, n = 3)

# Keep exactly n rows, breaking ties randomly
df |> slice_max(value, n = 3, with_ties = FALSE)

The with_ties parameter determines whether to include all rows that tie for the nth position.

Practical Example with Real Data

Let’s use the penguins data to see a more realistic example:

library(palmerpenguins)

penguins |>
  filter(!is.na(body_mass_g)) |>
  group_by(species) |>
  slice_max(body_mass_g, n = 2)

This finds the 2 heaviest penguins of each species, which is useful for understanding the size distribution across different penguin types.

Summary

slice_max() is an efficient way to extract the top n rows based on a specific variable, especially when you don’t need to sort your entire dataset. It works particularly well with grouped data, allowing you to find top values within each category. Remember to handle missing values appropriately and consider the with_ties parameter when exact row counts matter. This function is a great alternative to combining arrange() and head(), providing cleaner and more intuitive code for common data analysis tasks.

--- title: "How to get top and bottom rows of each group in R" description: "Learn get top and bottom rows of each group in r with clear examples and explanations." date: 2026-03-26 categories: ['dplyr', 'dplyr slice'] format: html: code-fold: false code-tools: true --- ## Introduction The `slice_max()` function in dplyr is a powerful tool for selecting the top n rows with the highest values from your data. Unlike simple sorting, `slice_max()` lets you efficiently extract just the records you need, making it perfect for finding top performers, highest scores, or maximum values within groups. This function is especially useful when working with grouped data where you want to find the top entries for each category. ## Getting Started First, let's load the tidyverse package and create some sample data to work with: ```r library(tidyverse) ``` We'll create a dataset with symbols and values to demonstrate different uses of `slice_max()`: ```r set.seed(2024) df <- tibble( symbol = sample(letters, 10), value = rnorm(10, mean = 5, sd = 10) ) df ``` This gives us a dataset with 10 random symbols and their corresponding numeric values, including both positive and negative numbers. ## Basic Sorting vs slice_max() Let's first see what our data looks like when sorted by value: ```r df |> arrange(value) ``` The `arrange()` function sorts all rows, but what if we only want the top 3 highest values? This is where `slice_max()` becomes useful. ## Finding Top Values To get the 3 rows with the highest values, we use `slice_max()`: ```r df |> slice_max(value, n = 3) ``` This returns only the top 3 rows with the highest values, which is much more efficient than sorting the entire dataset when you only need the top entries. ## Working with Grouped Data `slice_max()` becomes even more powerful when combined with `group_by()`. Let's first create a grouping variable: ```r df_grouped <- df |> mutate(direction = ifelse(value > 0, "positive", "negative")) df_grouped ``` Now we can find the top values within each group: ```r df_grouped |> group_by(direction) |> slice_max(value, n = 2) ``` This gives us the top 2 highest values for both positive and negative numbers separately. ## Handling Ties When there are tied values, `slice_max()` includes all tied observations by default. You can control this behavior: ```r # Include all ties (default) df |> slice_max(value, n = 3) # Keep exactly n rows, breaking ties randomly df |> slice_max(value, n = 3, with_ties = FALSE) ``` The `with_ties` parameter determines whether to include all rows that tie for the nth position. ## Practical Example with Real Data Let's use the penguins data to see a more realistic example: ```r library(palmerpenguins) penguins |> filter(!is.na(body_mass_g)) |> group_by(species) |> slice_max(body_mass_g, n = 2) ``` This finds the 2 heaviest penguins of each species, which is useful for understanding the size distribution across different penguin types. ## Summary `slice_max()` is an efficient way to extract the top n rows based on a specific variable, especially when you don't need to sort your entire dataset. It works particularly well with grouped data, allowing you to find top values within each category. Remember to handle missing values appropriately and consider the `with_ties` parameter when exact row counts matter. This function is a great alternative to combining `arrange()` and `head()`, providing cleaner and more intuitive code for common data analysis tasks.