Explain the steps of calculating Variable Importance in Random Forest

vishrut-singhal · 5 June 2021 16:49

The steps for calculating variable importance in Random Forest Algorithm are as follows:

1. For each tree grown in a random forest, find the number of votes for the correct class in out-of-bag data.

2. Now perform random permutation of a predictor’s values (let’s say variable-k) in the OOB data and then check the number of votes for the correct class. By “random permutation of a predictor’s values”, it means changing the order of values (shuffling).

3. At this step, we subtract the number of votes for the correct class in the variable-k-permuted data from the number of votes for the correct class in the original OOB data.

4. Now, the raw importance score for variable k is the average of this number over all trees in the forest. Then, we normalized the score by taking the standard deviation.

5. Variables having large values for this score are ranked as more important as building a current model without original values of a variable gives a worse prediction, which means the variable is important.