How to sum a column by group in R
In this tutorial, we will learn how to compute the sum of a column by groups in another column in a dataframe. Basic, idea is to group the dataframe by the grouping variable/column and then find the sum for each group.
How to sum a column by group in R Let us get started by loading tidyverse suite of R packages. We will be using dplyr’s group_by() function and summarize() functions to find the sum/total of a variable by group.
library(tidyverse)We will create a simple dataframe with two columns, where one is a grouping variable and the other is numerical variable. We use tibble() to create the dataframe from scratch, mainly using sample() function to creating the two variables.
set.seed(41)
df
## 1 g2 6
## 2 g1 18
## 3 g1 2
## 4 g2 13
## 5 g2 17
## 6 g2 19
## 7 g2 5
## 8 g1 20Computing sum of column in a dataframe based on a grouping column in R
dplyr’s group_by() function allows use to split the dataframe into smaller dataframes based on a variable of interest. The result after group_by() has all the elements of original dataframe, but with grouping information.
df %>%
group_by(grp)
## # A tibble: 8 × 2
## # Groups: grp [2]
## grp counts
##
## 1 g2 6
## 2 g1 18
## 3 g1 2
## 4 g2 13
## 5 g2 17
## 6 g2 19
## 7 g2 5
## 8 g1 20Then, we can use summarize() function to compute the sum of each grouping variable.
df %>%
group_by(grp) %>%
summarize(total = sum(counts))
## # A tibble: 2 × 2
## grp total
##
## 1 g1 40
## 2 g2 60