Inverse Watch Docs
AppLanding
  • Overview
    • Home
    • Governance
      • Proposal 7
      • Proposal 25
      • Proposal 52
      • Proposal 107
      • Proposal 147 - S1
      • Proposal 189 - S2
  • Products
    • Inverse Alerts
      • See on Twitter
    • Inverse Chatbot
      • /doc
      • /imagine
      • /data
      • /graph
    • Inverse Subgraphs
      • See inverse-subgraph on Mainnet
      • See inverse-governance-subgraph on Mainnet
    • Inverse Watch
      • Go to App
  • User Guide
    • Quickstart
    • Alerts
      • Setting Up an Alert
      • Adding New Alert Destinations
      • Customize Alert Template
      • Multiple Column Alert
    • Queries
      • Creating and Editing Queries
      • Querying Existing Query Results
      • Query Parameters
      • How to Schedule a Query
      • Favorites & Tagging
      • Query Filters
      • How To Download / Export Query Results
      • Query Snippets
    • Visualizations
      • Cohort Visualizations
      • Visualizations How-To
      • Chart Visualizations
      • Formatting Numbers in Visualizations
      • How to Make a Pivot Table
      • Funnel Visualizations
      • Table Visualization Options
      • Visualizations Types
    • Dashboards
      • Creating and Editing Dashboards
      • Favorites & Tagging
      • Sharing and Embedding Dashboards
    • Data Sources
      • CSV & Excel Files
      • Google Sheets
      • JSON (API)
      • Python
      • EVM Chain Logs
      • EVM Chain State
      • GraphQL
      • Dune API
    • Machine Learning
      • Data Engineering
      • Regressors
        • Linear Regression
        • Random Forest
        • Ada Boosting
        • Gradient Boosting
        • Neural Network (LSTM)
      • Training and Predicting
      • Metrics & Overfitting
      • Examples
        • Price Prediction
          • Data Preprocessing
          • Model Creation & Training
          • Metrics Evaluation
          • Back Testing
          • Visualizing
        • Liquidation Risk
  • Admin & Dev Guide
    • Setup
    • Redash
    • Integrations & API
    • Query Runners
    • Users
      • Adding a Profile Picture
      • Authentication Options
      • Group Management
      • Inviting Users to Use Redash
      • Permissions & Groups
    • Visualizations
  • Cheat Sheets
    • Snippets
    • Contracts
  • More
    • Deprecated Apps
    • Github : inverse-flaskbot
    • Github : inverse-subgraph
    • Github : inverse-watch
Powered by GitBook
On this page
  • How It Works
  • Initialization
  • Key Components
  • Training Process
  • Auto Mode
  • Multi-output Scenario
  • Advantages and Limitations

Was this helpful?

  1. User Guide
  2. Machine Learning
  3. Regressors

Linear Regression

PreviousRegressorsNextRandom Forest

Last updated 6 months ago

Was this helpful?

Linear Regression is a fundamental algorithm used for predicting a continuous target variable based on one or more input features. In our ML workflow, we support both simple linear regression (for continuous targets) and logistic regression (for categorical targets).


How It Works

Linear Regression works by finding the best-fitting straight line (or hyperplane in higher dimensions) through the data points. This line is determined by minimizing the sum of the squared differences between the predicted and actual values.

The general form of a linear regression model is:

y = β₀ + β₁x₁ + β₂x₂ + ... + βₙxₙ + ε

Where:

  • y is the dependent variable

  • x₁, x₂, ..., xₙ are the independent variables

  • β₀, β₁, β₂, ..., βₙ are the coefficients

  • ε is the error term


Initialization

The Linear Regression model is initialized in the initialize_regressor method:

if self.regressor == 'Regression':
    if is_classification:
        base_estimator_class = LogisticRegression
        base_param_dist = {
            'estimator__C': loguniform(1e-3, 1e3),
            'estimator__penalty': ['l1', 'l2', 'elasticnet'],
            'estimator__solver': ['lbfgs', 'newton-cg', 'saga'],
            'estimator__max_iter': randint(1000, 5000),
            'estimator__tol': loguniform(1e-6, 1e-3),
            'estimator__class_weight': [None, 'balanced'],
            'estimator__l1_ratio': uniform(0, 1)
        }
    else:
        base_estimator_class = LinearRegression
        base_param_dist = {
            'estimator__fit_intercept': [True, False],
        }

Key Components

  1. Model Selection:

    • For continuous targets, we use LinearRegression from scikit-learn.

    • For categorical targets, we use LogisticRegression from scikit-learn.

  2. Multi-output Support:

    • For multiple target variables, we use MultiOutputRegressor for regression tasks.

    • For multiple categorical targets, we use OneVsRestClassifier for classification tasks.

  3. Hyperparameter Tuning:

    • When auto_mode is enabled, we use RandomizedSearchCV for automated hyperparameter tuning.


Training Process

The training process is handled in the fit_regressor method:

  1. The method checks if we're dealing with a multi-output scenario.

  2. It reshapes the target variable y if necessary for consistency.

  3. For non-neural network models (including Linear/Logistic Regression):

    • If the model supports partial_fit, it uses a custom training loop that allows for stopping mid-training.

    • Otherwise, it fits the model in one go using the fit method.

After training, the model is serialized and stored


Auto Mode

When auto_mode is enabled:

  1. A RandomizedSearchCV object is created with the base estimator (Linear or Logistic Regression).

  2. It performs a randomized search over the specified parameter distributions.

  3. The best parameters found are saved and used for the final model.


Multi-output Scenario

For multiple target variables:

  1. In regression tasks, MultiOutputRegressor is used to wrap the LinearRegression estimator.

  2. In classification tasks, OneVsRestClassifier is used to wrap the LogisticRegression estimator.

  3. This allows the model to predict multiple target variables simultaneously.


Advantages and Limitations

Advantages:

  • Simple and interpretable

  • Fast to train and make predictions

  • Works well for linearly separable data

Limitations:

  • Assumes a linear relationship between features and target

  • Sensitive to outliers

  • May underfit complex datasets