How to Replace NA values in a dataframe with Zeros?
In this tutorial, we will learn how to replace all NA values in dataframe with a specific value like zero in R.
How to replace all NAs with Zeros ### Create a dataframe with NA values Let us get started with creating a dataframe with missing values,i.e. NAs in columns. We first create a vector with NAs using sample() function, where we sample a vector containing NAs - missing values with replacement.
set.seed(2020)
data <- sample(c(1:5,NA), 50, replace = TRUE)Our data looks like this.
data
## [1] 4 4 NA 1 1 4 2 NA 1 5 2 2 NA 5 2 3 2 5 4 2 NA NA 4 NA 4
## [26] 2 4 5 4 4 3 NA 2 2 NA 3 5 4 5 5 2 5 1 NA 3 5 1 5 3 1Let us convert our data vector into a matrix using the matrix() function. Here we specify a matrix with 5 columns.
data_mat <- matrix(data, ncol=5)Our matrix with missing values look like this.
head(data_mat)
## [,1] [,2] [,3] [,4] [,5]
## [1,] 4 2 NA 3 2
## [2,] 4 2 NA NA 5
## [3,] NA NA 4 2 1
## [4,] 1 5 NA 2 NA
## [5,] 1 2 4 NA 3
## [6,] 4 3 2 3 5And then we convert the matrix into a dataframe using as.data.frame() function.
data_df<- as.data.frame(data_mat)
head(data_df)
## V1 V2 V3 V4 V5
## 1 4 2 NA 3 2
## 2 4 2 NA NA 5
## 3 NA NA 4 2 1
## 4 1 5 NA 2 NA
## 5 1 2 4 NA 3
## 6 4 3 2 3 5Find the locations of NA values in R using is.na() function
To replace NAs with zeroes, we need to find which indices we have NAs. We will use is.na() function to find if an element in the dataframe is NA or not.
is.na(data_df)
## V1 V2 V3 V4 V5
## [1,] FALSE FALSE TRUE FALSE FALSE
## [2,] FALSE FALSE TRUE TRUE FALSE
## [3,] TRUE TRUE FALSE FALSE FALSE
## [4,] FALSE FALSE TRUE FALSE TRUE
## [5,] FALSE FALSE FALSE TRUE FALSE
## [6,] FALSE FALSE FALSE FALSE FALSE
## [7,] FALSE FALSE FALSE FALSE FALSE
## [8,] TRUE FALSE FALSE FALSE FALSE
## [9,] FALSE FALSE FALSE FALSE FALSE
## [10,] FALSE FALSE FALSE FALSE FALSEReplace all NA values to zeros in R
is.na() function gives us boolean dataframe and we can use that to replace NAs into zeros.
data_df[is.na(data_df)] <- 0Now our dataframe does not have any NAs, we have replaced them with zeroes.
data_df
## V1 V2 V3 V4 V5
## 1 4 2 0 3 2
## 2 4 2 0 0 5
## 3 0 0 4 2 1
## 4 1 5 0 2 0
## 5 1 2 4 0 3
## 6 4 3 2 3 5
## 7 2 2 4 5 1
## 8 0 5 5 4 5
## 9 1 4 4 5 3
## 10 5 2 4 5 1Replace all NA values to some specific numerical value
As you can see we can replace all NAs with some specific value. In this example, we replace all NAs with 1000.
data_df<- as.data.frame(data_mat)
data_df[is.na(data_df)] <- 1000
data_df
## V1 V2 V3 V4 V5
## 1 4 2 1000 3 2
## 2 4 2 1000 1000 5
## 3 1000 1000 4 2 1
## 4 1 5 1000 2 1000
## 5 1 2 4 1000 3
## 6 4 3 2 3 5
## 7 2 2 4 5 1
## 8 1000 5 5 4 5
## 9 1 4 4 5 3
## 10 5 2 4 5 1P.S. NAs are often missing values in a useful way. Before you replace all NAs into zeros or something else, one needs to make sure that is the right thing to go. The whole area of imputing missing values is active area in statistics.