Can you create an R decision tree?

Decision tree is a graph to represent choices and their results in form of a tree. The nodes in the graph represent an event or choice and the edges of the graph represent the decision rules or conditions. It is mostly used in Machine Learning and Data Mining applications using R.

Examples of use of decision tress is − predicting an email as spam or not spam, predicting of a tumor is cancerous or predicting a loan as a good or bad credit risk based on the factors in each of these. Generally, a model is created with observed data also called training data. Then a set of validation data is used to verify and improve the model. R has packages which are used to create and visualize decision trees. For new set of predictor variable, we use this model to arrive at a decision on the category (yes/No, spam/not spam) of the data.

The R package “party” is used to create decision trees.

A decision tree is a familiar graph for data scientists. It represents choices and results through the graphical form of a tree. To keep things simple, let’s just go over the basics.

Install the party package to get started with making the tree.

install.packages("party")

This gives you access to a fancy new function: ctree(), and, at its most basic, this is all we need to create a tree. First, let’s grab some data from our package; make sure the package is loaded.

library(party)

Now we have access to some new data sets. Part of the strucchange package that bundles with party includes data on youth homicides in Boston called BostonHomicide. Let’s use that one. You can print the data to the screen if you like.

print(BostonHomicide)

Now we’ll create the tree. The usage of ctree() goes something like this:

ctree(formula ,dataset )

We’ve got our data set. I’ll assign it to a variable for simplicity.

inputData <- BostonHomicides

Now we can determine our formula and create the tree.

treeAnalysis <- ctree(year~population+homicides+unemploy, data = inputData)

Let’s plot it!

plot(treeAnalysis)

Here’s the result I got.

R interview questions: R decision tree

Conclusion

From the decision tree shown above we can conclude that anyone whose readingSkills score is less than 38.3 and age is more than 6 is not a native Speaker.