# Gradient Boosting

Gradient Boosting is an ensemble learning method that builds a series of weak learners (typically decision trees) sequentially, with each new model correcting the errors of the previous ones. In our ML workflow, we support both Gradient Boosting Regression and Gradient Boosting Classification.

***

## <mark style="color:blue;">How it Works</mark>

Gradient Boosting works by iteratively improving the model's predictions. Here's a step-by-step explanation of the process:

<figure><img src="/files/OA4yDqDbJCiAAy2WZEoP" alt=""><figcaption></figcaption></figure>

1. **Initialization**:
   * Start with a simple model, often just predicting the mean of the target variable.
   * Set the number of iterations (trees) to use.
2. **Iterative Process**: For each iteration:
   * Calculate the residuals (errors) between the current model's predictions and the actual target values.
   * Fit a new weak learner (usually a decision tree) to predict these residuals.
   * Calculate the optimal step size (learning rate) to update the model.
   * Update the model by adding the new weak learner's predictions multiplied by the learning rate.
3. **Loss Function**:
   * The process aims to minimize a loss function (e.g., mean squared error for regression, log loss for classification).
   * The gradient of this loss function with respect to the model's predictions guides the boosting process.
4. **Regularization**:
   * Various techniques like limiting tree depth, subsampling, and shrinkage (learning rate) help prevent overfitting.
5. **Final Model**:
   * The final model is the sum of all weak learners, each weighted by the learning rate.
6. **Prediction**:
   * For a new input, each weak learner makes a prediction, and these are summed to get the final prediction.

This process allows Gradient Boosting to create a strong predictive model by focusing on and correcting the errors of previous iterations.

***

## <mark style="color:blue;">Initialization</mark>

The Gradient Boosting model is initialized in the `initialize_regressor` method:

```python
if self.regressor == 'GradientBoosting':
    base_estimator_class = GradientBoostingClassifier if is_classification else GradientBoostingRegressor
    param_dist = {
        'n_estimators': [50, 100, 200, 300, 500],
        'learning_rate': [0.01, 0.1, 0.5, 1.0],
        'max_depth': [3, 5, 10, None],
        'min_samples_split': [2, 5, 10],
        'min_samples_leaf': [1, 2, 4],
        'subsample': [0.8, 0.9, 1.0],
        'max_features': ['sqrt', 'log2', None],
    }
    if is_classification:
        param_dist['loss'] = ['log_loss', 'exponential']
    else:
        param_dist['loss'] = ['squared_error', 'absolute_error', 'huber', 'quantile']
```

***

## <mark style="color:blue;">Key Components</mark>

1. **Model Selection**:
   * For continuous targets, we use `GradientBoostingRegressor` from scikit-learn.
   * For categorical targets, we use `GradientBoostingClassifier` from scikit-learn.
2. **Multi-output Support**:
   * For multiple target variables, we use `MultiOutputRegressor` or `MultiOutputClassifier`.
3. **Hyperparameter Tuning**:
   * When `auto_mode` is enabled, we use `RandomizedSearchCV` for automated hyperparameter tuning.

***

## <mark style="color:blue;">Hyperparameters</mark>

The main hyperparameters for Gradient Boosting include:

* `n_estimators`: The number of boosting stages to perform.
* `learning_rate`: Shrinks the contribution of each tree, helping to prevent overfitting.
* `max_depth`: The maximum depth of the individual regression estimators.
* `min_samples_split`: The minimum number of samples required to split an internal node.
* `min_samples_leaf`: The minimum number of samples required to be at a leaf node.
* `subsample`: The fraction of samples to be used for fitting the individual base learners.
* `max_features`: The number of features to consider when looking for the best split.
* `loss`: The loss function to be optimized.

***

## <mark style="color:blue;">Training Process</mark>

The training process is handled in the `fit_regressor` method:

1. The method checks if we're dealing with a multi-output scenario.
2. It reshapes the target variable `y` if necessary for consistency.
3. The Gradient Boosting model is fitted using the `fit` method.

After training, the model is serialized and stored.

***

## <mark style="color:blue;">Auto Mode</mark>

When `auto_mode` is enabled:

1. A `RandomizedSearchCV` object is created with the base estimator (GradientBoostingRegressor or GradientBoostingClassifier).
2. It performs a randomized search over the specified parameter distributions.
3. The best parameters found are saved and used for the final model.

***

## <mark style="color:blue;">Multi-output Scenario</mark>

For multiple target variables:

1. In regression tasks, `MultiOutputRegressor` is used to wrap the `GradientBoostingRegressor`.
2. In classification tasks, `MultiOutputClassifier` is used to wrap the `GradientBoostingClassifier`.
3. This allows the model to predict multiple target variables simultaneously.

***

## <mark style="color:blue;">Advantages and Limitations</mark>

Advantages:

* Often provides higher accuracy than random forests
* Handles non-linear relationships well
* Can capture complex patterns in the data
* Provides feature importance rankings

Limitations:

* Can be prone to overfitting, especially with high learning rates
* Generally slower to train than random forests
* Less interpretable than single decision trees
* Sensitive to outliers and noisy data

***

## <mark style="color:blue;">Usage Tips</mark>

1. Start with a small learning rate (e.g., 0.01 or 0.1) and a moderate number of estimators.
2. Use early stopping or cross-validation to determine the optimal number of estimators.
3. Balance the learning rate and number of estimators: lower learning rates typically require more estimators.
4. Experiment with different subsample rates to introduce randomness and prevent overfitting.
5. For high-dimensional data, consider setting `max_features` to 'sqrt' or 'log2'.
6. Monitor training and validation errors to detect and prevent overfitting.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.inverse.watch/user-guide/machine-learning/regressors/gradient-boosting.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
