dplyr count(): count unique values of a variable

dplyr count()
Master dplyr count() to count unique values of a variable. Complete R tutorial with examples using real datasets.
Published

January 26, 2022

Introduction

The count() function from dplyr is one of the most useful tools for exploratory data analysis in R. It provides a quick and efficient way to count the number of occurrences of unique values within one or more variables in your dataset. This function is particularly valuable when you need to understand the distribution of categorical variables, identify the most common values, or get a quick overview of your data structure.

You’ll find count() especially helpful during initial data exploration, quality checks, or when creating frequency tables for reporting. It’s also commonly used as a preprocessing step before creating visualizations like bar charts or preparing data for statistical analysis.

Getting Started

First, let’s load the required packages. We’ll use the tidyverse for data manipulation and the palmerpenguins dataset for our examples.

library(tidyverse)
library(palmerpenguins)

Example 1: Basic Usage

The simplest use of count() is to count occurrences of a single variable. Let’s count the number of penguins by species in the Palmer penguins dataset:

penguins |> 
  count(species)

You can also count multiple variables simultaneously. This creates a frequency table showing all combinations:

penguins |> 
  count(species, island)

To sort the results by frequency, add the sort parameter:

penguins |> 
  count(species, sort = TRUE)

If you want to customize the name of the count column (which defaults to “n”), use the name parameter:

penguins |> 
  count(species, name = "total_penguins")

Example 2: Practical Application

Let’s explore a more complex scenario where we analyze penguin populations across different islands and years, focusing on complete cases only. This demonstrates how count() integrates seamlessly with other dplyr functions:

penguins |> 
  filter(!is.na(body_mass_g), !is.na(sex)) |> 
  count(island, year, species, sort = TRUE) |> 
  filter(n >= 10) |> 
  arrange(island, desc(n))

We can also use count() with conditional logic. Here’s how to count penguins by size categories we create on the fly:

penguins |> 
  filter(!is.na(body_mass_g)) |> 
  mutate(size_category = case_when(
    body_mass_g < 3500 ~ "Small",
    body_mass_g < 4500 ~ "Medium",
    TRUE ~ "Large"
  )) |> 
  count(species, size_category, sort = TRUE) |> 
  pivot_wider(names_from = size_category, values_from = n, values_fill = 0)

For percentage calculations, you can combine count() with mutate():

penguins |> 
  count(species) |> 
  mutate(
    percentage = round(n / sum(n) * 100, 1),
    percentage_label = paste0(percentage, "%")
  )

Summary

The count() function is an essential tool for data exploration and summarization in R. Key takeaways include:

  • Use count(variable) for basic frequency counts of single variables
  • Count multiple variables with count(var1, var2) to see all combinations
  • Add sort = TRUE to automatically order results by frequency
  • Customize the count column name with the name parameter
  • Combine with other dplyr functions like filter() and mutate() for more complex analyses
  • Use with pivot_wider() to create cross-tabulation tables

Remember that count() automatically removes rows with NA values in the counted variables, so consider filtering or handling missing data explicitly when needed for your analysis.