Remove rows with missing values using na.omit() in R

na.omit R
Dealing with missing values is a common activity while doing data cleaning and analysis. In R missing values typically represented as NA. Often you might wan…
Published

September 9, 2021

Remove Rows with missing values using na.omit() Remove Rows with missing values in R First we will learn how to remove rows with missing values in a dataframe and then we will learn how to use na.omit() function to remove rows with NA in a matrix. ### Create Data with missing values Let us create a sample dataframe with some missing values. We will use data.frame() function available in base R to create a simple dataframe from scratch.

df <- data.frame(col1 = letters[1:5], 
                 col2 = c(1,2,NA,4,5), 
                 col3 = c(1:4,NA), 
                 col4 = 1:5)

In this example we have created a data frame with two rows containing missing values NA.

df
##   col1 col2 col3 col4
## 1    a    1    1    1
## 2    b    2    2    2
## 3    c   NA    3    3
## 4    d    4    4    4
## 5    e    5   NA    5

Removing rows with missing values in a data frame

We can remove rows containing one or more missing values NA using na.omit() function in R. By using na.omit() function on the data frame, we get a new dataframe with three rows after removing the two rows with missing values.

na.omit(df)

##   col1 col2 col3 col4
## 1    a    1    1    1
## 2    b    2    2    2
## 4    d    4    4    4

Removing rows with missing values in a matrix

na.omit() in R can also be used to remove rows containing missing values NA from a matrix object. Here we create a matrix using the numerical columns of the above dataframe

data_matrix <- as.matrix(df[,2:4])
data_matrix
##      col2 col3 col4
## [1,]    1    1    1
## [2,]    2    2    2
## [3,]   NA    3    3
## [4,]    4    4    4
## [5,]    5   NA    5

Our matrix has three columns and five rows, but two of the rows have missing values NA. By applying na.omit() on the matrix we will get a new matrix with no missing values in any of the rows. Basically na.omit() function, removes the two rows containing missing values.

na.omit(data_matrix)
##      col2 col3 col4
## [1,]    1    1    1
## [2,]    2    2    2
## [3,]    4    4    4
## attr(,"na.action")
## [1] 3 5
## attr(,"class")
## [1] "omit"