Ensemble Learning

May 29, 2024 9:06 AM

Ensemble learning is like teamwork for algorithms. Instead of relying on just one algorithm to make predictions, ensemble learning combines multiple algorithms together to improve accuracy and make more reliable predictions.

Bagging is a type of ensemble learning where multiple copies of the same algorithm are trained on different random subsets of the data (bootstrapping). Each copy learns something slightly different, and then their predictions are combined to make a final decision. It's like asking multiple experts to give their opinions, and then taking the average or most common answer.

Boosting, on the other hand, is a bit like learning from your mistakes. It starts with a weak algorithm and focuses on the mistakes it makes. It then trains more copies of the algorithm, each one paying extra attention to where the previous ones went wrong. This iterative process continues until the predictions become more accurate.

Resources

Bagging Vs Boosting

Info

Very nice intro to bagging and boosting giving an overview.
Nice to form an intuition for getting started on ensemble models.
Fails to give a comprehensive understanding of bias and variance effects on both.

📚 Boosting and Bagging explained with examples !!!

Very clearly explains how boosting and bagging works with very simple examples!

Bagging and Boosting Algorithms

Examples of Bagging Algorithms

Random Forest
- Description: Combines multiple decision trees trained on different subsets of the data. Each tree votes, and the majority vote is taken as the final prediction.
- Use Case: Effective for both classification and regression tasks.
Bagged Decision Trees
- Description: Multiple decision trees are trained on different bootstrap samples of the data, and their predictions are averaged (regression) or voted (classification).
- Use Case: Reduces the variance of decision trees.

Examples of Boosting Algorithms

AdaBoost (Adaptive Boosting)
- Description: Sequentially adds weak learners, each focusing more on the misclassified samples from the previous learners. The final prediction is a weighted sum of the weak learners' predictions.
- Use Case: Often used for binary classification problems.
Gradient Boosting
- Description: Sequentially adds models that predict the residual errors of prior models. This helps to improve accuracy by focusing on the errors of previous models.
- Use Case: Effective for both classification and regression tasks.
XGBoost (Extreme Gradient Boosting)
- Description: An optimized version of gradient boosting that includes regularization to prevent overfitting and is known for its speed and performance.
- Use Case: Widely used in machine learning competitions for its high performance.
LightGBM (Light Gradient Boosting Machine)
- Description: A gradient boosting framework that uses tree-based learning algorithms, designed for efficiency and scalability.
- Use Case: Works well with large datasets and is used for both classification and regression tasks.

Summary

Bagging Algorithms: Random Forest, Bagged Decision Trees
Boosting Algorithms: AdaBoost, Gradient Boosting, XGBoost, LightGBM

When to use Bagging vs when to use Boosting?

Bagging

When to use it?
- When you have high variance: Your model is too sensitive to the data (e.g., it changes a lot with different training data).
- When you want to reduce overfitting: Bagging helps to make your model more stable by averaging multiple models.
- When you can train multiple models independently: Each model can be trained in parallel.
How it works?
- Bootstrap sampling: Create multiple subsets of the training data by sampling with replacement.
- Train multiple models: Train a model on each subset.
- Aggregate predictions: Combine the predictions of all models (e.g., by averaging for regression or voting for classification).

Boosting

When to use it?
- When you have high bias: Your model is too simple and underfits the data.
- When you want to improve weak learners: Boosting builds a strong model by combining many weak models.
- When you can handle sequential training: Each model is trained based on the errors of the previous one, so they need to be trained in sequence.
How it works?
- Sequential learning: Train a model, then adjust the training data based on the errors of that model.
- Focus on difficult cases: Each subsequent model focuses more on the data points that were previously misclassified.
- Combine models: Each model’s predictions are weighted based on their accuracy, and the final prediction is a combination of all models.

Summary

Bagging: Use when you want to reduce variance and can train models independently.
Boosting: Use when you want to reduce bias and can train models sequentially.

Here's a simple analogy:

Bagging: Imagine asking several friends (independently) for their opinions on a movie and then taking the average of their scores.
Boosting: Imagine asking one friend for their opinion, then another friend to improve upon that opinion, and so on, until you get a refined score.