Regressors

Our machine learning workflow supports a variety of regressors, each with its own strengths and use cases. This document provides an overview of the regressors we use, with links to more detailed info

Supported Regressors

  1. Linear Regression

    • Simple and interpretable

    • Suitable for linear relationships between features and targets

    • Includes Logistic Regression for classification tasks

  2. Random Forest

    • Ensemble method using multiple decision trees

    • Handles non-linear relationships well

    • Provides feature importance rankings

  3. Gradient Boosting

    • Builds an ensemble of weak learners sequentially

    • Often provides high accuracy

    • Includes variants like XGBoost and LightGBM

  4. AdaBoost

    • Adaptive Boosting algorithm

    • Focuses on hard-to-classify instances

    • Works well with weak learners

  5. Neural Networks

    • Flexible architecture for complex patterns

    • Supports both regression and classification

    • Includes options for deep learning

Multi-output Support

All our regressors support multi-output scenarios, allowing prediction of multiple targets simultaneously.

Auto Mode

Each regressor includes an "auto mode" that performs automated hyperparameter tuning to optimize model performance.

Usage

To use a specific regressor, set the regressor option in your model configuration.

For more detailed information about each regressor, including its specific parameters, strengths, and use cases, please refer to the individual documentation.

"Guess the Candies" Example

Imagine we're trying to guess how many candies are in a jar. We have information about the jar's height, width, and weight. Here's how each regressor might approach this problem:

  1. Linear Regression: This is like drawing a straight line through our data points. It might say, "For every inch taller the jar is, add 10 candies. For every inch wider, add 15 candies. For every ounce heavier, add 5 candies." It's simple but might miss some complex relationships.

  2. Random Forest: This is like asking a bunch of friends to guess, each using slightly different rules, then taking the average of all their guesses. One friend might focus more on the height, another on the weight, and so on. By combining all these guesses, we often get a pretty good estimate.

  3. Gradient Boosting: This is like guessing, then looking at where we went wrong, and making a new rule to fix those mistakes. We keep doing this, making new rules to fix the remaining errors, until our guesses get really good.

  4. AdaBoost: This is similar to Gradient Boosting, but it pays special attention to the jars we guessed really badly on. It's like saying, "Oops, we were way off on that tall, skinny jar. Let's make sure we have a special rule for jars like that."

  5. Neural Networks: This is like having a super-smart friend who looks at all the jars and candies, and comes up with their own complex method for guessing. We don't always know exactly how they're doing it, but their guesses are often very accurate, especially if we have lots of jars to learn from.

Each method has its strengths, and the best choice often depends on how many jars we've seen before, how complex the relationship between jar features and candy count is, and how much time we have to make our guesses.

Last updated