duplicated() function in R: Find duplicated elements in a vector or dataframe
In this tutorial, we will learn about the base R function duplicated() and how can we use duplicated() function to find if an element in a vector is duplicated or a row in a dataframe is duplicated. duplicated() function can take a vector, matrix or a dataframe as input and give us boolean or logical vector telling if it duplicated or not.
Find Duplicate elements in a vector with duplicated()
Let us create some data vector with duplicates. Here we use sample() function to get bootstrapped samples with replacements.
set.seed(123)
x
## 1 3 2
## 2 3 2
## 3 3 1
## 4 2 2
## 5 3 3
## 6 2 1
## 7 2 3
## 8 2 3
## 9 3 1
## 10 1 1By using duplicated() function on the dataframe we can get boolean vector identifying if the row is duplicated or not.
duplicated(df)
## [1] FALSE TRUE FALSE FALSE FALSE FALSE FALSE TRUE TRUE FALSEIn this example, we can see that second row is duplicated and eighth row is also duplicated as they are TRUE in the boolean vector.
Find Duplicated rows in a matrix with duplicated()
We can use duplicated() function on a matrix to find the rows that are duplicated. Let us convert the dataframe we created above into a matrix using as.matrix() function.
mat <- as.matrix(df)Now we have our data as matrix and using duplicated() function on the matrix, we can identify the rows that are duplicated.
duplicated(mat)
## [1] FALSE TRUE FALSE FALSE FALSE FALSE FALSE TRUE TRUE FALSE