Dummy coding is used in regression analysis for categorizing the variable. Dummy variable in R programming is a type of variable that represents a characteristic of an experiment. A dummy variable is either 1 or 0 and 1 can be represented as either True or False and 0 can be represented as False or True depending upon the user.
The goal of fastDummies is to quickly create dummy variables (columns) and dummy rows. Creating dummy variables is possible through base R or other packages, but this package is much faster than those methods.
Installation:
To install this package, use the code
install.packages("fastDummies")
# The development version is available on Github.
# install.packages("devtools")
devtools::install_github("jacobkap/fastDummies")
Using dummy_cols() function
dummy_cols()
function is present in fastDummies package. It creates dummy variables on the basis of parameters provided in the function. If columns are not selected in the function call for which dummy variable has to be created, then dummy variables are created for all characters and factors column in the dataframe.
Syntax:
dummy_cols(.data, select_columns = NULL)
Parameters:
.data: represents object for which dummy columns has to be created
select_columns: represents columns for which dummy variables has to be created
Example:
# Create a dataframe
df <- data.frame(gender = c("m", "f", "m"),
age = c(19, 20, 20),
city = c("Delhi", "Mumbai",
"Delhi"))
# Create dummy variables
# select_columns = NULL uses all
# character and factor columns
# to create dummy variable
df <- dummy_cols(df)
# Print
print(df)
Output:
gender age city gender_f gender_m city_Delhi city_Mumbai
1 m 19 Delhi 0 1 1 0
2 f 20 Mumbai 1 0 0 1
3 m 20 Delhi 0 1 1 0