Explain Decision Tree algorithm in detail

Explain Decision Tree algorithm in detail.

A decision tree is a flowchart-like structure in which each internal node represents a “test” on an attribute (e.g. whether a coin flip comes up heads or tails), each branch represents the outcome of the test, and each leaf node represents a class label (decision taken after computing all attributes).

Advantages: Decision trees are easy to interpret, nonparametric (which means they are robust to outliers), and there are relatively few parameters to tune.

Disadvantages: Decision trees are prone to overfit. However, this can be addressed by ensemble methods like random forests or boosted trees.

Decision tree is a very popular machine learning algorithm. It can be used for solving various categories of problems but popularly it is used for classification problem.

Classification problem- A type of problem where the end result or the solution is categorical data. Like, if you have to answer whether a person is afflicted with a disease or not then your answer will surely be a yes or no.

Decision tree algorithm is preferred over others because it is the closest to how human brain works. Decision tree algorithm is like a sequence of if else statements and next statement is based on the previous result.

To put it in layman’s terms Decision Trees creates a formula or IF ELSE statements as their results.

for example -

IF (Rainy == Yes) → (Laundry == No)

ELSE IF (Sunny == No) → (Laundry == No)

ELSE IF (Sunny == Yes) → (Laundry == Yes)

ELSE → (Laundry == No)

The Decision Tree uses Entropy and Information Gain (I am not going too much technical here) for calculating the features that contribute more information in the dataset.