What is Random Forest? How does it work?

What is Random Forest? How does it work ?

Let me try to answer it in an over simplified way :slight_smile:
Lets say if you were to make a prediction either binary or multi class or a numeric value. You would build a logical equation from data . Lets say we want to predict house of an apartment in mumbai - you would say if the area is 1000 sq.ft, if the age of the building is 7 years etc then the price will be X. similarly for a binary prediction you will say with the above attributes, the apartment is highly likely to sell soon or not. Now for a different set of attributes, you will get a different answer. Now image this list of attributes in he form of a tree - If the sq ft is below 1000, if the age is above 5 years then a particular answer. These are called decision tree models in data science. Random forest is the advanced machine learning version of decision trees where in order to get better accuracy , it creates multiple trees on various samples from the same data and aggregates them at the end to give an answer which makes it more accurate that a simple single decision tree.

A Random forest is an algorithm in Machine Learning that is associated with supervised learning. It is a collection of decision trees that is seen as a whole, rather than individually. The Random Forest algorithm can be used for both Classification and Regression problems. The working of Random Forest can be better understood by looking at the quantum of data it can effortlessly work on. For instance if a Random Forest algorithm runs on a data-set of 50000 and variables being 100 cases and 100 variables, it can produce 100 trees in just a span of 10 minutes in a machine of 800 MHZ.

Random Forest has the following features:-

  • Offers better accuracy
  • Ensures efficiency even on large data sets.
  • Better methods for detecting interaction of variables.
  • Able to handle large amounts of variables without deletion.
  • It helps in knowing as to which variable is important for classification.
  • It helps in providing an unbiased estimate of the generalisation of the error.