dplyr count(): count unique values of a variable
Introduction
The count() function from dplyr is one of the most useful tools for exploratory data analysis in R. It provides a quick and efficient way to count the number of occurrences of unique values within one or more variables in your dataset. This function is particularly valuable when you need to understand the distribution of categorical variables, identify the most common values, or get a quick overview of your data structure.
You’ll find count() especially helpful during initial data exploration, quality checks, or when creating frequency tables for reporting. It’s also commonly used as a preprocessing step before creating visualizations like bar charts or preparing data for statistical analysis.
Getting Started
First, let’s load the required packages. We’ll use the tidyverse for data manipulation and the palmerpenguins dataset for our examples.
library(tidyverse)
library(palmerpenguins)Example 1: Basic Usage
The simplest use of count() is to count occurrences of a single variable. Let’s count the number of penguins by species in the Palmer penguins dataset:
penguins |>
count(species)You can also count multiple variables simultaneously. This creates a frequency table showing all combinations:
penguins |>
count(species, island)To sort the results by frequency, add the sort parameter:
penguins |>
count(species, sort = TRUE)If you want to customize the name of the count column (which defaults to “n”), use the name parameter:
penguins |>
count(species, name = "total_penguins")Example 2: Practical Application
Let’s explore a more complex scenario where we analyze penguin populations across different islands and years, focusing on complete cases only. This demonstrates how count() integrates seamlessly with other dplyr functions:
penguins |>
filter(!is.na(body_mass_g), !is.na(sex)) |>
count(island, year, species, sort = TRUE) |>
filter(n >= 10) |>
arrange(island, desc(n))We can also use count() with conditional logic. Here’s how to count penguins by size categories we create on the fly:
penguins |>
filter(!is.na(body_mass_g)) |>
mutate(size_category = case_when(
body_mass_g < 3500 ~ "Small",
body_mass_g < 4500 ~ "Medium",
TRUE ~ "Large"
)) |>
count(species, size_category, sort = TRUE) |>
pivot_wider(names_from = size_category, values_from = n, values_fill = 0)For percentage calculations, you can combine count() with mutate():
penguins |>
count(species) |>
mutate(
percentage = round(n / sum(n) * 100, 1),
percentage_label = paste0(percentage, "%")
)Summary
The count() function is an essential tool for data exploration and summarization in R. Key takeaways include:
- Use
count(variable)for basic frequency counts of single variables - Count multiple variables with
count(var1, var2)to see all combinations - Add
sort = TRUEto automatically order results by frequency - Customize the count column name with the
nameparameter - Combine with other dplyr functions like
filter()andmutate()for more complex analyses - Use with
pivot_wider()to create cross-tabulation tables