Remove rows with missing values using na.omit() in R
Dealing with missing values is a common activity while doing data cleaning and analysis. In R missing values typically represented as NA. Often you might want to remove rows containing missing values in a dataframe or a matrix. In this tutorial we will learn how to remove rows containing missing values using na.omit() function available in stats package in base R.
Remove Rows with missing values in R First we will learn how to remove rows with missing values in a dataframe and then we will learn how to use na.omit() function to remove rows with NA in a matrix. ### Create Data with missing values Let us create a sample dataframe with some missing values. We will use data.frame() function available in base R to create a simple dataframe from scratch.
df <- data.frame(col1 = letters[1:5],
col2 = c(1,2,NA,4,5),
col3 = c(1:4,NA),
col4 = 1:5)In this example we have created a data frame with two rows containing missing values NA.
df
## col1 col2 col3 col4
## 1 a 1 1 1
## 2 b 2 2 2
## 3 c NA 3 3
## 4 d 4 4 4
## 5 e 5 NA 5Removing rows with missing values in a data frame
We can remove rows containing one or more missing values NA using na.omit() function in R. By using na.omit() function on the data frame, we get a new dataframe with three rows after removing the two rows with missing values.
na.omit(df)
## col1 col2 col3 col4
## 1 a 1 1 1
## 2 b 2 2 2
## 4 d 4 4 4Removing rows with missing values in a matrix
na.omit() in R can also be used to remove rows containing missing values NA from a matrix object. Here we create a matrix using the numerical columns of the above dataframe
data_matrix <- as.matrix(df[,2:4])
data_matrix
## col2 col3 col4
## [1,] 1 1 1
## [2,] 2 2 2
## [3,] NA 3 3
## [4,] 4 4 4
## [5,] 5 NA 5Our matrix has three columns and five rows, but two of the rows have missing values NA. By applying na.omit() on the matrix we will get a new matrix with no missing values in any of the rows. Basically na.omit() function, removes the two rows containing missing values.
na.omit(data_matrix)
## col2 col3 col4
## [1,] 1 1 1
## [2,] 2 2 2
## [3,] 4 4 4
## attr(,"na.action")
## [1] 3 5
## attr(,"class")
## [1] "omit"