How to Replace NA values in a dataframe with Zeros?

replace NA

Published

January 11, 2022

In this tutorial, we will learn how to replace all NA values in dataframe with a specific value like zero in R. How to replace all NAs with Zeros ### Create a dataframe with NA values Let us get started with creating a dataframe with missing values,i.e. NAs in columns. We first create a vector with NAs using sample() function, where we sample a vector containing NAs - missing values with replacement.

set.seed(2020)
data <- sample(c(1:5,NA), 50, replace = TRUE)

Our data looks like this.

data

##  [1]  4  4 NA  1  1  4  2 NA  1  5  2  2 NA  5  2  3  2  5  4  2 NA NA  4 NA  4
## [26]  2  4  5  4  4  3 NA  2  2 NA  3  5  4  5  5  2  5  1 NA  3  5  1  5  3  1

Let us convert our data vector into a matrix using the matrix() function. Here we specify a matrix with 5 columns.

data_mat <- matrix(data, ncol=5)

Our matrix with missing values look like this.

head(data_mat)
##      [,1] [,2] [,3] [,4] [,5]
## [1,]    4    2   NA    3    2
## [2,]    4    2   NA   NA    5
## [3,]   NA   NA    4    2    1
## [4,]    1    5   NA    2   NA
## [5,]    1    2    4   NA    3
## [6,]    4    3    2    3    5

And then we convert the matrix into a dataframe using as.data.frame() function.

data_df<- as.data.frame(data_mat)
head(data_df)
##   V1 V2 V3 V4 V5
## 1  4  2 NA  3  2
## 2  4  2 NA NA  5
## 3 NA NA  4  2  1
## 4  1  5 NA  2 NA
## 5  1  2  4 NA  3
## 6  4  3  2  3  5

Find the locations of NA values in R using is.na() function

To replace NAs with zeroes, we need to find which indices we have NAs. We will use is.na() function to find if an element in the dataframe is NA or not.

is.na(data_df)

##          V1    V2    V3    V4    V5
##  [1,] FALSE FALSE  TRUE FALSE FALSE
##  [2,] FALSE FALSE  TRUE  TRUE FALSE
##  [3,]  TRUE  TRUE FALSE FALSE FALSE
##  [4,] FALSE FALSE  TRUE FALSE  TRUE
##  [5,] FALSE FALSE FALSE  TRUE FALSE
##  [6,] FALSE FALSE FALSE FALSE FALSE
##  [7,] FALSE FALSE FALSE FALSE FALSE
##  [8,]  TRUE FALSE FALSE FALSE FALSE
##  [9,] FALSE FALSE FALSE FALSE FALSE
## [10,] FALSE FALSE FALSE FALSE FALSE

Replace all NA values to zeros in R

is.na() function gives us boolean dataframe and we can use that to replace NAs into zeros.

data_df[is.na(data_df)] <- 0

Now our dataframe does not have any NAs, we have replaced them with zeroes.

data_df
##    V1 V2 V3 V4 V5
## 1   4  2  0  3  2
## 2   4  2  0  0  5
## 3   0  0  4  2  1
## 4   1  5  0  2  0
## 5   1  2  4  0  3
## 6   4  3  2  3  5
## 7   2  2  4  5  1
## 8   0  5  5  4  5
## 9   1  4  4  5  3
## 10  5  2  4  5  1

Replace all NA values to some specific numerical value

As you can see we can replace all NAs with some specific value. In this example, we replace all NAs with 1000.

data_df<- as.data.frame(data_mat)
data_df[is.na(data_df)] <- 1000
data_df
##      V1   V2   V3   V4   V5
## 1     4    2 1000    3    2
## 2     4    2 1000 1000    5
## 3  1000 1000    4    2    1
## 4     1    5 1000    2 1000
## 5     1    2    4 1000    3
## 6     4    3    2    3    5
## 7     2    2    4    5    1
## 8  1000    5    5    4    5
## 9     1    4    4    5    3
## 10    5    2    4    5    1

P.S. NAs are often missing values in a useful way. Before you replace all NAs into zeros or something else, one needs to make sure that is the right thing to go. The whole area of imputing missing values is active area in statistics.

--- title: "How to Replace NA values in a dataframe with Zeros?" date: 2022-01-11 categories: ['replace NA'] format: html: code-fold: false code-tools: true --- In this tutorial, we will learn how to replace all NA values in dataframe with a specific value like zero in R. ![How to replace Missing Values with Zeros in R](https://rstats101.com/wp-content/uploads/2022/01/replace_NAs_with_zeros_in_R.png) How to replace all NAs with Zeros ### Create a dataframe with NA values Let us get started with creating a dataframe with missing values,i.e. NAs in columns. We first create a vector with NAs using [sample() function](https://rstats101.com/sample-function-in-r/), where we sample a vector containing NAs - missing values with replacement. ```r set.seed(2020) data <- sample(c(1:5,NA), 50, replace = TRUE) ``` Our data looks like this. ```r data ## [1] 4 4 NA 1 1 4 2 NA 1 5 2 2 NA 5 2 3 2 5 4 2 NA NA 4 NA 4 ## [26] 2 4 5 4 4 3 NA 2 2 NA 3 5 4 5 5 2 5 1 NA 3 5 1 5 3 1 ``` Let us convert our data vector into a matrix using the [matrix() function](https://rstats101.com/3-useful-matrix-functions-in-r-matrix-is-matrix-and-as-matrix/). Here we specify a matrix with 5 columns. ```r data_mat <- matrix(data, ncol=5) ``` Our matrix with missing values look like this. ```r head(data_mat) ## [,1] [,2] [,3] [,4] [,5] ## [1,] 4 2 NA 3 2 ## [2,] 4 2 NA NA 5 ## [3,] NA NA 4 2 1 ## [4,] 1 5 NA 2 NA ## [5,] 1 2 4 NA 3 ## [6,] 4 3 2 3 5 ``` And then we convert the matrix into a dataframe using [as.data.frame() function](https://rstats101.com/convert-a-list-to-a-dataframe-in-r/). ```r data_df<- as.data.frame(data_mat) head(data_df) ## V1 V2 V3 V4 V5 ## 1 4 2 NA 3 2 ## 2 4 2 NA NA 5 ## 3 NA NA 4 2 1 ## 4 1 5 NA 2 NA ## 5 1 2 4 NA 3 ## 6 4 3 2 3 5 ``` ### Find the locations of NA values in R using is.na() function To replace NAs with zeroes, we need to find which indices we have NAs. We will use is.na() function to find if an element in the dataframe is NA or not. ```r is.na(data_df) ## V1 V2 V3 V4 V5 ## [1,] FALSE FALSE TRUE FALSE FALSE ## [2,] FALSE FALSE TRUE TRUE FALSE ## [3,] TRUE TRUE FALSE FALSE FALSE ## [4,] FALSE FALSE TRUE FALSE TRUE ## [5,] FALSE FALSE FALSE TRUE FALSE ## [6,] FALSE FALSE FALSE FALSE FALSE ## [7,] FALSE FALSE FALSE FALSE FALSE ## [8,] TRUE FALSE FALSE FALSE FALSE ## [9,] FALSE FALSE FALSE FALSE FALSE ## [10,] FALSE FALSE FALSE FALSE FALSE ``` ### Replace all NA values to zeros in R is.na() function gives us boolean dataframe and we can use that to replace NAs into zeros. ```r data_df[is.na(data_df)] <- 0 ``` Now our dataframe does not have any NAs, we have replaced them with zeroes. ```r data_df ## V1 V2 V3 V4 V5 ## 1 4 2 0 3 2 ## 2 4 2 0 0 5 ## 3 0 0 4 2 1 ## 4 1 5 0 2 0 ## 5 1 2 4 0 3 ## 6 4 3 2 3 5 ## 7 2 2 4 5 1 ## 8 0 5 5 4 5 ## 9 1 4 4 5 3 ## 10 5 2 4 5 1 ``` ### Replace all NA values to some specific numerical value As you can see we can replace all NAs with some specific value. In this example, we replace all NAs with 1000. ```r data_df<- as.data.frame(data_mat) data_df[is.na(data_df)] <- 1000 data_df ## V1 V2 V3 V4 V5 ## 1 4 2 1000 3 2 ## 2 4 2 1000 1000 5 ## 3 1000 1000 4 2 1 ## 4 1 5 1000 2 1000 ## 5 1 2 4 1000 3 ## 6 4 3 2 3 5 ## 7 2 2 4 5 1 ## 8 1000 5 5 4 5 ## 9 1 4 4 5 3 ## 10 5 2 4 5 1 ``` **P.S.** NAs are often missing values in a useful way. Before you replace all NAs into zeros or something else, one needs to make sure that is the right thing to go. The whole area of imputing missing values is active area in statistics.