How to Replace Missing Values(NA) in R: na.omit & na.rm

brahmajit-mohapatra-f8fe5582 · 13 May 2021 20:04

In R, missing values are often represented as NA. Dealing with missing values is an important aspect of data analysis and preprocessing. Two common functions used to handle missing values are na.omit() and na.rm.

na.omit():
The na.omit() function is used to remove rows with missing values from a data frame. It returns a new data frame with the rows containing missing values removed.

Example: Using na.omit() to remove rows with missing values

data ← data.frame(ID = c(1, 2, 3, 4),
Value = c(10, NA, 30, NA))

cleaned_data ← na.omit(data)
print(cleaned_data)

na.rm:
na.rm is an argument that is commonly used in various functions, such as mean(), sum(), and others. When na.rm is set to TRUE, the function will ignore missing values while performing calculations.

Example: Using na.rm in mean() to calculate the mean without missing values

values ← c(10, 20, NA, 30, 40)
mean_value ← mean(values, na.rm = TRUE)
print(mean_value)

Remember, both na.omit() and na.rm deal with missing values, but they serve different purposes:

Use na.omit() to remove entire rows with missing values from a data frame.
Use na.rm = TRUE as an argument in functions like mean() to perform calculations while ignoring missing values.
Choose the appropriate method based on your data analysis needs. Removing missing values might lead to data loss, while using na.rm allows you to perform calculations while retaining all the data.