duplicated() function in R: Find duplicated elements in a vector or dataframe

duplicated()
rstats
Published

June 30, 2021

In this tutorial, we will learn about the base R function duplicated() and how can we use duplicated() function to find if an element in a vector is duplicated or a row in a dataframe is duplicated. duplicated() function can take a vector, matrix or a dataframe as input and give us boolean or logical vector telling if it duplicated or not.

Find Duplicate elements in a vector with duplicated()

Let us create some data vector with duplicates. Here we use sample() function to get bootstrapped samples with replacements.

set.seed(123)
x  
##  1     3     2
##  2     3     2
##  3     3     1
##  4     2     2
##  5     3     3
##  6     2     1
##  7     2     3
##  8     2     3
##  9     3     1
## 10     1     1

By using duplicated() function on the dataframe we can get boolean vector identifying if the row is duplicated or not.

duplicated(df)
##  [1] FALSE  TRUE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE FALSE

In this example, we can see that second row is duplicated and eighth row is also duplicated as they are TRUE in the boolean vector.

Find Duplicated rows in a matrix with duplicated()

We can use duplicated() function on a matrix to find the rows that are duplicated. Let us convert the dataframe we created above into a matrix using as.matrix() function.

mat <- as.matrix(df)

Now we have our data as matrix and using duplicated() function on the matrix, we can identify the rows that are duplicated.

duplicated(mat)
##  [1] FALSE  TRUE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE FALSE