The Bias vs Variance issue comes when deciding which algorithm to choose for your ML model. As the complexity of Machine Learning algorithms increase, their Bias decreases.
By bias, I mean the bias towards the data. Simpler algorithms like Logistic Regression assume a linear relationship between features, which results in a lot of bias, as the model will always fit a linear line no matter what.
However, more complex algorithms like Random Forests, don’t make such assumptions and have lesser bias.
Variance comes into picture when we make predictions multiple times with different test data and the results are very different each time. This means that the algorithm might give a score as high as 98% sometimes and even as low as 85% sometimes, suggesting over-fitting.
This shows that the model is accurate but not precise, hence the variation in predictions is large. This is usually the case with complex algorithms.
However, with simpler algorithms, the variance is less. They might persistently give a low score of, say 88%, but the score won’t vary much, hence giving not accurate, but precise predictions.
So, there is a trade-off and we need to find a middle ground between both the extreme situations.
#machinelearning #datascience