Linear Regression
Linear Regression is a fundamental algorithm used for predicting a continuous target variable based on one or more input features. In our ML workflow, we support both simple linear regression (for continuous targets) and logistic regression (for categorical targets).
How It Works
Linear Regression works by finding the best-fitting straight line (or hyperplane in higher dimensions) through the data points. This line is determined by minimizing the sum of the squared differences between the predicted and actual values.

The general form of a linear regression model is:
y = β₀ + β₁x₁ + β₂x₂ + ... + βₙxₙ + ε
Where:
y is the dependent variable
x₁, x₂, ..., xₙ are the independent variables
β₀, β₁, β₂, ..., βₙ are the coefficients
ε is the error term
Initialization
The Linear Regression model is initialized in the initialize_regressor method:
if self.regressor == 'Regression':
if is_classification:
base_estimator_class = LogisticRegression
base_param_dist = {
'estimator__C': loguniform(1e-3, 1e3),
'estimator__penalty': ['l1', 'l2', 'elasticnet'],
'estimator__solver': ['lbfgs', 'newton-cg', 'saga'],
'estimator__max_iter': randint(1000, 5000),
'estimator__tol': loguniform(1e-6, 1e-3),
'estimator__class_weight': [None, 'balanced'],
'estimator__l1_ratio': uniform(0, 1)
}
else:
base_estimator_class = LinearRegression
base_param_dist = {
'estimator__fit_intercept': [True, False],
}Key Components
Model Selection:
For continuous targets, we use
LinearRegressionfrom scikit-learn.For categorical targets, we use
LogisticRegressionfrom scikit-learn.
Multi-output Support:
For multiple target variables, we use
MultiOutputRegressorfor regression tasks.For multiple categorical targets, we use
OneVsRestClassifierfor classification tasks.
Hyperparameter Tuning:
When
auto_modeis enabled, we useRandomizedSearchCVfor automated hyperparameter tuning.
Training Process
The training process is handled in the fit_regressor method:
The method checks if we're dealing with a multi-output scenario.
It reshapes the target variable
yif necessary for consistency.For non-neural network models (including Linear/Logistic Regression):
If the model supports
partial_fit, it uses a custom training loop that allows for stopping mid-training.Otherwise, it fits the model in one go using the
fitmethod.
After training, the model is serialized and stored
Auto Mode
When auto_mode is enabled:
A
RandomizedSearchCVobject is created with the base estimator (Linear or Logistic Regression).It performs a randomized search over the specified parameter distributions.
The best parameters found are saved and used for the final model.
Multi-output Scenario
For multiple target variables:
In regression tasks,
MultiOutputRegressoris used to wrap theLinearRegressionestimator.In classification tasks,
OneVsRestClassifieris used to wrap theLogisticRegressionestimator.This allows the model to predict multiple target variables simultaneously.
Advantages and Limitations
Advantages:
Simple and interpretable
Fast to train and make predictions
Works well for linearly separable data
Limitations:
Assumes a linear relationship between features and target
Sensitive to outliers
May underfit complex datasets
Last updated
Was this helpful?