Inverse Watch Docs
AppLanding
  • Overview
    • Home
    • Governance
      • Proposal 7
      • Proposal 25
      • Proposal 52
      • Proposal 107
      • Proposal 147 - S1
      • Proposal 189 - S2
  • Products
    • Inverse Alerts
      • See on Twitter
    • Inverse Chatbot
      • /doc
      • /imagine
      • /data
      • /graph
    • Inverse Subgraphs
      • See inverse-subgraph on Mainnet
      • See inverse-governance-subgraph on Mainnet
    • Inverse Watch
      • Go to App
  • User Guide
    • Quickstart
    • Alerts
      • Setting Up an Alert
      • Adding New Alert Destinations
      • Customize Alert Template
      • Multiple Column Alert
    • Queries
      • Creating and Editing Queries
      • Querying Existing Query Results
      • Query Parameters
      • How to Schedule a Query
      • Favorites & Tagging
      • Query Filters
      • How To Download / Export Query Results
      • Query Snippets
    • Visualizations
      • Cohort Visualizations
      • Visualizations How-To
      • Chart Visualizations
      • Formatting Numbers in Visualizations
      • How to Make a Pivot Table
      • Funnel Visualizations
      • Table Visualization Options
      • Visualizations Types
    • Dashboards
      • Creating and Editing Dashboards
      • Favorites & Tagging
      • Sharing and Embedding Dashboards
    • Data Sources
      • CSV & Excel Files
      • Google Sheets
      • JSON (API)
      • Python
      • EVM Chain Logs
      • EVM Chain State
      • GraphQL
      • Dune API
    • Machine Learning
      • Data Engineering
      • Regressors
        • Linear Regression
        • Random Forest
        • Ada Boosting
        • Gradient Boosting
        • Neural Network (LSTM)
      • Training and Predicting
      • Metrics & Overfitting
      • Examples
        • Price Prediction
          • Data Preprocessing
          • Model Creation & Training
          • Metrics Evaluation
          • Back Testing
          • Visualizing
        • Liquidation Risk
  • Admin & Dev Guide
    • Setup
    • Redash
    • Integrations & API
    • Query Runners
    • Users
      • Adding a Profile Picture
      • Authentication Options
      • Group Management
      • Inviting Users to Use Redash
      • Permissions & Groups
    • Visualizations
  • Cheat Sheets
    • Snippets
    • Contracts
  • More
    • Deprecated Apps
    • Github : inverse-flaskbot
    • Github : inverse-subgraph
    • Github : inverse-watch
Powered by GitBook
On this page
  • How it Works
  • Initialization
  • Key Components
  • Hyperparameters
  • Training Process
  • Auto Mode
  • Multi-output Scenario
  • Advantages and Limitations
  • Usage Tips

Was this helpful?

  1. User Guide
  2. Machine Learning
  3. Regressors

Ada Boosting

PreviousRandom ForestNextGradient Boosting

Last updated 6 months ago

Was this helpful?

AdaBoost (Adaptive Boosting) is an ensemble learning method that combines multiple weak learners to create a strong classifier or regressor. It works by iteratively training weak learners and adjusting the weights of misclassified instances. In our ML workflow, we support both AdaBoost for regression and classification tasks.


How it Works

AdaBoost works by iteratively building a strong learner from multiple weak learners. Here's a step-by-step explanation of the process:

  1. Initialization:

    • Assign equal weights to all training samples.

    • Set the number of weak learners (estimators) to use.

  2. Iterative Process: For each iteration:

    • Train a weak learner (e.g., a decision stump) on the weighted training data.

    • Calculate the error rate of the weak learner.

    • Compute the importance (weight) of the weak learner based on its error rate.

    • Update the weights of the training samples:

      • Increase weights for misclassified samples.

      • Decrease weights for correctly classified samples.

    • Normalize the weights so they sum to 1.

  3. Final Model:

    • Combine all weak learners into a strong learner.

    • Each weak learner's prediction is weighted by its importance.

  4. Prediction:

    • For classification: The final prediction is the class with the highest weighted sum of weak learner predictions.

    • For regression: The final prediction is the weighted sum of weak learner predictions.

This process allows AdaBoost to focus on the hard-to-classify examples, improving the model's performance iteratively.


Initialization

The AdaBoost model is initialized in the initialize_regressor method:

if self.regressor == 'AdaBoost':
    base_estimator_class = AdaBoostClassifier if is_classification else AdaBoostRegressor
    param_dist = {
        'n_estimators': [50, 100, 200, 300, 500],
        'learning_rate': [0.01, 0.1, 0.5, 1.0],
    }
    if is_classification:
        param_dist['algorithm'] = ['SAMME', 'SAMME.R']

Key Components

  1. Model Selection:

    • For continuous targets, we use AdaBoostRegressor from scikit-learn.

    • For categorical targets, we use AdaBoostClassifier from scikit-learn.

  2. Multi-output Support:

    • For multiple target variables, we use MultiOutputRegressor or MultiOutputClassifier.

  3. Hyperparameter Tuning:

    • When auto_mode is enabled, we use RandomizedSearchCV for automated hyperparameter tuning.


Hyperparameters

The main hyperparameters for AdaBoost include:

  • n_estimators: The maximum number of estimators at which boosting is terminated.

  • learning_rate: Weight applied to each classifier at each boosting iteration.

  • algorithm (for classification only): The boosting algorithm to use. Can be 'SAMME' or 'SAMME.R'.


Training Process

The training process is handled in the fit_regressor method:

  1. The method checks if we're dealing with a multi-output scenario.

  2. It reshapes the target variable y if necessary for consistency.

  3. The AdaBoost model is fitted using the fit method.

After training, the model is serialized and stored.


Auto Mode

When auto_mode is enabled:

  1. A RandomizedSearchCV object is created with the base estimator (AdaBoostRegressor or AdaBoostClassifier).

  2. It performs a randomized search over the specified parameter distributions.

  3. The best parameters found are saved and used for the final model.


Multi-output Scenario

For multiple target variables:

  1. In regression tasks, MultiOutputRegressor is used to wrap the AdaBoostRegressor.

  2. In classification tasks, MultiOutputClassifier is used to wrap the AdaBoostClassifier.

  3. This allows the model to predict multiple target variables simultaneously.


Advantages and Limitations

Advantages:

  • Less prone to overfitting compared to other boosting methods

  • Can achieve high accuracy

  • Automatically handles feature selection

  • Works well with weak learners

Limitations:

  • Sensitive to noisy data and outliers

  • Can be computationally expensive

  • Prone to overfitting if the number of estimators is too large


Usage Tips

  1. Start with a moderate number of estimators (e.g., 50 or 100) and adjust based on performance.

  2. Use cross-validation to find the optimal learning_rate.

  3. For classification tasks, experiment with both 'SAMME' and 'SAMME.R' algorithms to see which performs better.

  4. Monitor the training error as the number of estimators increases to detect potential overfitting.

  5. Consider using AdaBoost in combination with decision trees as weak learners for interpretable results.

By understanding these components, you can effectively use and customize the AdaBoost implementation in our ML workflow to suit your specific needs.