Random forest is a versatile machine learning method capable of performing both regression and classification tasks.
Like bagging and boosting, random forest works by combining a set of other tree models. Random forest builds a tree from a random sample of the columns in the test data.
Here’s are the steps how a random forest creates the trees:
- Take a sample size from the training data.
- Begin with a single node.
- Run the following algorithm, from the start node:
- If the number of observations is less than node size then stop.
- Select random variables.
- Find the variable that does the “best” job splitting the observations.
- Split the observations into two nodes.
- Call step
a
on each of these nodes.