Metrics Evaluation

After training your model, it's crucial to evaluate its performance using various metrics. You can access these metrics on the main model page in the Inverse Watch UI or receive them via Discord if you've set it up as a notification destination.

Accessing Metrics

  1. Main Model Page: Navigate to your model's page in the Inverse Watch UI to view detailed metrics.

  1. Discord Notifications: If configured, you'll receive metric updates in your designated Discord channel or web hook.

Understanding the Metrics

Price Metrics

Mean Absolute Error (MAE): 77.0788

Interpretation: On average, the predictions deviate by about 77 units from actual prices.

Mean Squared Error (MSE): 16,598.5195

Interpretation: MSE gives higher weight to larger errors. A larger value suggests the model is making some significant errors.

R2 Score: 0.6802

Interpretation: The model explains approximately 68% of the variance in the target price variable, indicating moderately good predictive power.

Train Performance: 0.7246

Interpretation: The model's R2 score on the training data shows how well the model fits the training data.

Validation Performance: 0.6802

Interpretation: This R2 score on the validation set measures the model's ability to generalize to unseen data.

Log Price Change Metrics

Mean Absolute Error (MAE): 0.0402

Interpretation: On average, the predictions for log price changes deviate by about 0.04.

Mean Squared Error (MSE): 0.0041

Interpretation: MSE for log price change is small, indicating fewer large errors.

R2 Score: 0.2208

Interpretation: The model explains about 22% of the variance in the log price change, suggesting that predicting log price changes is more challenging than predicting absolute prices.

Train Performance: 0.2588

Interpretation: The R2 score on the training data for log price change indicates how well the model fits this aspect during training.

Validation Performance: 0.2208

Interpretation: The R2 score for the log price change on validation data shows that the model has lower predictive power for this feature.

Overall Metrics

The overall metrics represent an average of metrics from different target variables (price and log price change):

Mean Absolute Error (MAE): 38.5595

Interpretation: On average, the model's predictions deviate by about 38.56 units from actual values across all targets.

Mean Squared Error (MSE): 8299.2618

Interpretation: This value gives higher weight to larger prediction errors. It is useful for comparing models.

R2 Score: 0.4505

Interpretation: The model explains about 45% of the variance in the target variables, showing moderate predictive power.

Is Overfitted: False

Interpretation: The model doesn’t exhibit signs of overfitting, as indicated by a low overfitting score (0.0914).

Interpreting the Results

  • Model Performance: An R2 score of 0.4505 indicates moderate predictive power, which leaves room for improvement.

  • Overfitting: The model does not show signs of overfitting (Overfitting Score: 0.0914). This means the model is generalizing well across both training and validation datasets.

  • Price vs. Log Price Change: The model performs better at predicting the actual price (R2 = 0.6802) than the log price change (R2 = 0.2208). This suggests predicting log price changes is more challenging, possibly requiring further feature engineering or model refinement.

  • Prediction Accuracy: A mean absolute error of 77.0788 for price indicates that on average, the model’s predictions are off by 77 units. You should assess if this level of accuracy is acceptable for your use case.

  • Train vs. Validation Performance: The slightly better performance on training data (R2 = 0.7246) compared to validation (R2 = 0.6802) suggests the model generalizes well, with no significant overfitting.

Metrics History

Inverse Watch offers historical tracking of the model’s performance:

  1. Prediction Metrics: Displays how metrics such as MAE for price predictions have evolved over time. This can help identify whether the model is becoming more accurate or stable with more data.

  1. Training Metrics: Displays the mean absolute error for log price change or corresponding metrics/column during training. The relatively stable line suggests consistent performance across training iterations. Additionally the tooltip will show you what were the exact parameters used for after the training for the selected best candidate.

Last updated