# Linear Regression

Linear Regression is a fundamental  algorithm used for predicting a continuous target variable based on one or more input features. In our ML workflow, we support both simple linear regression (for continuous targets) and logistic regression (for categorical targets).

***

## <mark style="color:blue;">How It Works</mark>

Linear Regression works by finding the best-fitting straight line (or hyperplane in higher dimensions) through the data points. This line is determined by minimizing the sum of the squared differences between the predicted and actual values.

<figure><img src="/files/3e8Pq1xGupHZhg7iMX56" alt=""><figcaption></figcaption></figure>

The general form of a linear regression model is:

y = β₀ + β₁x₁ + β₂x₂ + ... + βₙxₙ + ε

Where:

* y is the dependent variable
* x₁, x₂, ..., xₙ are the independent variables
* β₀, β₁, β₂, ..., βₙ are the coefficients
* ε is the error term

***

## <mark style="color:blue;">Initialization</mark>

The Linear Regression model is initialized in the `initialize_regressor` method:

```python
if self.regressor == 'Regression':
    if is_classification:
        base_estimator_class = LogisticRegression
        base_param_dist = {
            'estimator__C': loguniform(1e-3, 1e3),
            'estimator__penalty': ['l1', 'l2', 'elasticnet'],
            'estimator__solver': ['lbfgs', 'newton-cg', 'saga'],
            'estimator__max_iter': randint(1000, 5000),
            'estimator__tol': loguniform(1e-6, 1e-3),
            'estimator__class_weight': [None, 'balanced'],
            'estimator__l1_ratio': uniform(0, 1)
        }
    else:
        base_estimator_class = LinearRegression
        base_param_dist = {
            'estimator__fit_intercept': [True, False],
        }
```

***

## <mark style="color:blue;">Key Components</mark>

1. **Model Selection**:
   * For continuous targets, we use `LinearRegression` from scikit-learn.
   * For categorical targets, we use `LogisticRegression` from scikit-learn.
2. **Multi-output Support**:
   * For multiple target variables, we use `MultiOutputRegressor` for regression tasks.
   * For multiple categorical targets, we use `OneVsRestClassifier` for classification tasks.
3. **Hyperparameter Tuning**:
   * When `auto_mode` is enabled, we use `RandomizedSearchCV` for automated hyperparameter tuning.

***

## <mark style="color:blue;">Training Process</mark>

The training process is handled in the `fit_regressor` method:

1. The method checks if we're dealing with a multi-output scenario.
2. It reshapes the target variable `y` if necessary for consistency.
3. For non-neural network models (including Linear/Logistic Regression):
   * If the model supports `partial_fit`, it uses a custom training loop that allows for stopping mid-training.
   * Otherwise, it fits the model in one go using the `fit` method.

After training, the model is serialized and stored

***

## <mark style="color:blue;">Auto Mode</mark>

When `auto_mode` is enabled:

1. A `RandomizedSearchCV` object is created with the base estimator (Linear or Logistic Regression).
2. It performs a randomized search over the specified parameter distributions.
3. The best parameters found are saved and used for the final model.

***

## <mark style="color:blue;">Multi-output Scenario</mark>

For multiple target variables:

1. In regression tasks, `MultiOutputRegressor` is used to wrap the `LinearRegression` estimator.
2. In classification tasks, `OneVsRestClassifier` is used to wrap the `LogisticRegression` estimator.
3. This allows the model to predict multiple target variables simultaneously.

***

## <mark style="color:blue;">Advantages and Limitations</mark>

Advantages:

* Simple and interpretable
* Fast to train and make predictions
* Works well for linearly separable data

Limitations:

* Assumes a linear relationship between features and target
* Sensitive to outliers
* May underfit complex datasets


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.inverse.watch/user-guide/machine-learning/regressors/linear-regression.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
