# Machine Learning

### <mark style="color:blue;">Overview</mark>

This gitbook provides a comprehensive overview of the machine learning workflow implemented in Inverse Watch. The workflow is designed to be flexible, supporting various types of regression and both single and multi-output scenarios.

### <mark style="color:blue;">Integration with Inverse Watch</mark>

By leveraging the capabilities of our platform, which is forked from Redash, this workflow becomes a highly versatile tool for feeding machine learning models. Our powerful data visualization and query capabilities allow users to seamlessly integrate data from various sources, making it easier to prepare and feed data into the ML models. This integration enhances the overall efficiency and effectiveness of the machine learning process, providing a robust environment for data-driven decision-making.

<img src="/files/kYqeAAWy4zA1qtm8HyYV" alt="ML Workflow" class="gitbook-drawing">

### <mark style="color:blue;">Workflow Components</mark>

The ML workflow consists of the following main components:

1. <mark style="color:blue;">**Data Preparation**</mark><mark style="color:blue;">:</mark>
   * Query execution to fetch raw data
   * Data cleaning and structuring
   * Identification of feature types (numeric, categorical, timestamp)
   * Encoding of categorical variables
   * Scaling of numeric features
   * Extraction of time-based features from timestamps
   * Dimensionality reduction using autoencoders
2. <mark style="color:blue;">**Feature Engineering**</mark><mark style="color:blue;">:</mark>
   * Automatic detection and transformation of feature types
   * Application of cyclical encoding for time-based features
   * Use of autoencoders for dimensionality reduction
3. <mark style="color:blue;">**Model Initialization**</mark><mark style="color:blue;">:</mark>
   * Selection of appropriate regressor based on configuration
   * Initialization of model with default or user-specified parameters
4. <mark style="color:blue;">**Model Training**</mark><mark style="color:blue;">:</mark>
   * Splitting data into training and validation sets
   * Training the model on the training data
   * Hyperparameter tuning if auto\_mode is enabled
5. <mark style="color:blue;">**Model Evaluation and Tuning**</mark><mark style="color:blue;">:</mark>
   * Evaluation of model performance on validation data
   * Selection of best hyperparameters (if in auto\_mode)
   * Saving of best model and parameters
6. <mark style="color:blue;">**Prediction**</mark><mark style="color:blue;">:</mark>
   * Loading of trained model
   * Preprocessing of new data
   * Generation of predictions
   * Decoding of predictions into human-readable format

### <mark style="color:blue;">Supported Regressors</mark>

The system supports multiple types of regressors, each with its own initialization and training process:

* <mark style="color:blue;">**Linear/Logistic Regression**</mark><mark style="color:blue;">:</mark> Offers simplicity and interpretability for both continuous and categorical targets. Key hyperparameters include `fit_intercept` for linear regression and `C` for logistic regression.
* <mark style="color:blue;">**Random Forest**</mark><mark style="color:blue;">:</mark> Utilizes ensemble learning to improve prediction accuracy and reduce overfitting. Key hyperparameters include `n_estimators`, `max_depth`, and `max_features`.
* <mark style="color:blue;">**AdaBoost**</mark><mark style="color:blue;">:</mark> Combines multiple weak learners to create a strong regressor, with support for both regression and classification tasks. Key hyperparameters include `n_estimators` and `learning_rate`.
* <mark style="color:blue;">**Gradient Boosting**</mark><mark style="color:blue;">:</mark> Sequentially builds models to correct errors from previous models, suitable for capturing complex patterns. Key hyperparameters include `n_estimators`, `learning_rate`, and `max_depth`.
* <mark style="color:blue;">**Neural Networks**</mark><mark style="color:blue;">:</mark> Employs deep learning techniques for handling complex, non-linear relationships in data. Key hyperparameters include `epochs`, `batch_size`, and `learning_rate`.

### <mark style="color:blue;">Detailed Regressor Documentation</mark>

For more detailed information on each regressor type and its specific workflow, please refer to the individual regressor documentation:

* [Linear/Logistic Regression](/user-guide/machine-learning/regressors/linear-regression.md)
* [Random Forest Regressor](/user-guide/machine-learning/regressors/random-forest.md)
* [AdaBoost Regressor](/user-guide/machine-learning/regressors/ada-boosting.md)
* [Gradient Boosting Regressor](/user-guide/machine-learning/regressors/gradient-boosting.md)
* [Neural Network Regressor](/user-guide/machine-learning/regressors/neural-network-lstm.md)

### <mark style="color:blue;">Key Classes</mark>

#### <mark style="color:blue;">MLModel</mark>

The main class that orchestrates the entire ML workflow. It handles data preparation, feature engineering, and manages the training and prediction processes.

#### <mark style="color:blue;">TunedMultiOutputEstimator</mark>

A custom estimator that supports multi-output scenarios and hyperparameter tuning for traditional machine learning models.

#### <mark style="color:blue;">TuneableNNRegressor</mark>

A custom class for neural network models that supports hyperparameter tuning and handles both single and multi-output scenarios.

### <mark style="color:blue;">Auto Mode</mark>

The system supports an "auto mode" for each regressor type, enabling automated hyperparameter tuning to optimize model performance.

### <mark style="color:blue;">Multi-output Support</mark>

The workflow is designed to handle both single-output and multi-output scenarios, using appropriate wrappers or custom implementations.

### <mark style="color:blue;">Serialization and Storage</mark>

Trained models are serialized and stored in the database, facilitating easy retrieval and deployment.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.inverse.watch/user-guide/machine-learning.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
