- Take the entire data set as input
- Calculate entropy of the target variable, as well as the predictor attributes
- Calculate your information gain of all attributes (we gain information on sorting different objects from each other)
- Choose the attribute with the highest information gain as the root node
- Repeat the same procedure on every branch until the decision node of each branch is finalized
For example, let’s say you want to build a decision tree to decide whether you should accept or decline a job offer. The decision tree for this case is as shown:
It is clear from the decision tree that an offer is accepted if:
- Salary is greater than $50,000
- The commute is less than an hour
- Incentives are offered