When we predict our model, one important thing is how accurate our model is, so we need to know the errors in our model. Usually, model error contains three parts: bias, variance and noise.
-
Prediction error
-
Reducible error
- Bias
- Variance
-
Irreducible error
- Noise
-
What is bias?
Bias is the difference between the average prediction of our model and the correct value which we are trying to predict. Model with high bias pays very little attention to the training data and oversimplifies the model and thus leads to underfitting. It always leads to high error on training and test data.
What is variance?
Variance is the variability of model prediction for a given data point or a value which tells us spread of our data. Model with high variance pays a lot of attention to training data and does not generalize on the data which it hasn’t seen before. As a result, such models perform very well on training data but has high error rates on test data.
Bias-Variance Tradeoff
Some tips
- Train MSE ALWAYS has smaller error than test MSE (Because you have seen train data before!).
- Overfitting will happen when you only have a very limited dataset.
- A minimal test MSE does not guarantee a minimal train MSE, and vice versa.
Reference :Understanding the Bias-Variance Tradeoff