Bias vs Variance

Bias

Bias refers to errors introduced by assuming that our model is simpler than the real-world problem we are trying to solve. Think of it as an archer consistently aiming at a target but always missing in the same direction because they are not accounting for the wind.

Variance

Variance refers to errors introduced because our model is too sensitive to the small fluctuations in the training data. Imagine the same archer hitting different spots all over the target every time they shoot because they are overreacting to every gust of wind.

Balance Between Bias and Variance

The goal in machine learning is to find the right balance between bias and variance:

Here's a visual analogy:

Balancing bias and variance is key to building models that perform well on unseen data.

graph TD;
    A[Model Performance];
    A --> B(High Bias, Low Variance);
    A --> C(Low Bias, High Variance);
    A --> D(Low Bias, Low Variance);
    A --> E(High Bias, High Variance);

    B -->|Underfitting| F{{Example: Simple Model}};
    C -->|Overfitting| G{{Example: Complex Model}};
    D -->|Good Generalization| H{{Example: Balanced Model}};
    E -->|Poor Model| I{{Example: Poor Model}};

    style B fill:#f96,stroke:#333,stroke-width:2px;
    style C fill:#f66,stroke:#333,stroke-width:2px;
    style D fill:#6f6,stroke:#333,stroke-width:2px;
    style E fill:#f66,stroke:#333,stroke-width:2px;
    style F fill:#f96,stroke:#333,stroke-width:1px;
    style G fill:#f66,stroke:#333,stroke-width:1px;
    style H fill:#6f6,stroke:#333,stroke-width:1px;
    style I fill:#f66,stroke:#333,stroke-width:1px;

Resources

Why Do We Need the Bias Term in ML Algorithms?


Bias Vs Variance


Splitting Data into Training and Testing Sets