How to get the first row of each group in R

dplyr
dplyr first
Learn get the first row of each group in r with clear examples and explanations.
Published

March 26, 2026

Introduction

Calculating proportions and frequencies is a fundamental task in data analysis that helps us understand the relative distribution of categorical variables. This tutorial demonstrates how to use dplyr functions like count() and mutate() to calculate proportions from grouped data. These techniques are essential when you need to convert raw counts into percentages or when comparing the relative sizes of different groups in your dataset.

Setup

First, let’s load the required packages and prepare our data for analysis.

library(tidyverse)
library(palmerpenguins)

We’ll use the Palmer penguins dataset, but first we need to remove any missing values to ensure accurate calculations.

penguins_clean <- penguins |>
  drop_na()

This gives us a clean dataset with complete observations for all variables.

Basic Counting

Let’s start by counting the number of penguins by species to understand our data structure.

penguins_clean |>
  count(species)

The count() function returns the frequency of each species in our dataset. This shows us the absolute numbers, but often we want to know the relative proportions.

Calculating Simple Proportions

To convert counts into proportions, we can use mutate() to create a new column that divides each count by the total.

penguins_clean |>
  count(species) |>
  mutate(prop = n / sum(n))

This creates a proportion column where all values sum to 1.0, showing us what fraction of the total each species represents.

Alternative Approach with prop.table()

R provides the prop.table() function as an alternative way to calculate proportions from counts.

penguins_clean |>
  count(species) |>
  mutate(freq = prop.table(n))

Both approaches yield the same results, but prop.table() can be more explicit about your intention to calculate proportions.

Proportions with Multiple Groups

When working with multiple categorical variables, calculating proportions becomes more complex. Let’s count penguins by both species and sex.

penguins_clean |>
  count(species, sex)

This gives us the count for each combination of species and sex.

Overall Proportions Across Groups

To calculate what proportion each species-sex combination represents of the total dataset:

penguins_clean |>
  count(species, sex) |>
  mutate(prop = n / sum(n))

Each proportion represents that group’s share of the entire dataset, and all proportions will sum to 1.0.

Proportions Within Groups

Often you’ll want to calculate proportions within each species rather than across the entire dataset.

penguins_clean |>
  count(species, sex) |>
  mutate(prop_within_species = n / sum(n), .by = species)

The .by argument tells mutate() to calculate proportions separately for each species. Now the proportions within each species sum to 1.0, showing the sex distribution within each species.

Summary

This tutorial covered several approaches for calculating proportions from categorical data using dplyr. The key techniques include using count() to get frequencies, mutate() with division to calculate simple proportions, and the .by argument to calculate proportions within specific groups. These methods are essential for exploratory data analysis and creating meaningful summaries of categorical variables in your datasets.