How to sum a column by group in R

dplyr group_by()
dplyr summarize()
Published

April 17, 2022

In this tutorial, we will learn how to compute the sum of a column by groups in another column in a dataframe. Basic, idea is to group the dataframe by the grouping variable/column and then find the sum for each group. How to sum a column by group in R How to sum a column by group in R Let us get started by loading tidyverse suite of R packages. We will be using dplyr’s group_by() function and summarize() functions to find the sum/total of a variable by group.

library(tidyverse)

We will create a simple dataframe with two columns, where one is a grouping variable and the other is numerical variable. We use tibble() to create the dataframe from scratch, mainly using sample() function to creating the two variables.

set.seed(41)
df   
## 1 g2         6
## 2 g1        18
## 3 g1         2
## 4 g2        13
## 5 g2        17
## 6 g2        19
## 7 g2         5
## 8 g1        20

Computing sum of column in a dataframe based on a grouping column in R

dplyr’s group_by() function allows use to split the dataframe into smaller dataframes based on a variable of interest. The result after group_by() has all the elements of original dataframe, but with grouping information.

df %>%
  group_by(grp)

## # A tibble: 8 × 2
## # Groups:   grp [2]
##   grp   counts
##     
## 1 g2         6
## 2 g1        18
## 3 g1         2
## 4 g2        13
## 5 g2        17
## 6 g2        19
## 7 g2         5
## 8 g1        20

Then, we can use summarize() function to compute the sum of each grouping variable.

df %>%
  group_by(grp) %>%
  summarize(total = sum(counts))

## # A tibble: 2 × 2
##   grp   total
##    
## 1 g1       40
## 2 g2       60