Metrics Evaluation
Last updated
Last updated
After training your model, it's crucial to evaluate its performance using various metrics. You can access these metrics on the main model page in the Inverse Watch UI or receive them via Discord if you've set it up as a notification destination.
Main Model Page: Navigate to your model's page in the Inverse Watch UI to view detailed metrics.
Discord Notifications: If configured, you'll receive metric updates in your designated Discord channel or web hook.
Mean Absolute Error (MAE): 77.0788
Interpretation: On average, the predictions deviate by about 77 units from actual prices.
Mean Squared Error (MSE): 16,598.5195
Interpretation: MSE gives higher weight to larger errors. A larger value suggests the model is making some significant errors.
R2 Score: 0.6802
Interpretation: The model explains approximately 68% of the variance in the target price variable, indicating moderately good predictive power.
Train Performance: 0.7246
Interpretation: The model's R2 score on the training data shows how well the model fits the training data.
Validation Performance: 0.6802
Interpretation: This R2 score on the validation set measures the model's ability to generalize to unseen data.
Mean Absolute Error (MAE): 0.0402
Interpretation: On average, the predictions for log price changes deviate by about 0.04.
Mean Squared Error (MSE): 0.0041
Interpretation: MSE for log price change is small, indicating fewer large errors.
R2 Score: 0.2208
Interpretation: The model explains about 22% of the variance in the log price change, suggesting that predicting log price changes is more challenging than predicting absolute prices.
Train Performance: 0.2588
Interpretation: The R2 score on the training data for log price change indicates how well the model fits this aspect during training.
Validation Performance: 0.2208
Interpretation: The R2 score for the log price change on validation data shows that the model has lower predictive power for this feature.
The overall metrics represent an average of metrics from different target variables (price and log price change):
Mean Absolute Error (MAE): 38.5595
Interpretation: On average, the model's predictions deviate by about 38.56 units from actual values across all targets.
Mean Squared Error (MSE): 8299.2618
Interpretation: This value gives higher weight to larger prediction errors. It is useful for comparing models.
R2 Score: 0.4505
Interpretation: The model explains about 45% of the variance in the target variables, showing moderate predictive power.
Is Overfitted: False
Interpretation: The model doesn’t exhibit signs of overfitting, as indicated by a low overfitting score (0.0914).
Model Performance: An R2 score of 0.4505 indicates moderate predictive power, which leaves room for improvement.
Overfitting: The model does not show signs of overfitting (Overfitting Score: 0.0914). This means the model is generalizing well across both training and validation datasets.
Price vs. Log Price Change: The model performs better at predicting the actual price (R2 = 0.6802) than the log price change (R2 = 0.2208). This suggests predicting log price changes is more challenging, possibly requiring further feature engineering or model refinement.
Prediction Accuracy: A mean absolute error of 77.0788 for price indicates that on average, the model’s predictions are off by 77 units. You should assess if this level of accuracy is acceptable for your use case.
Train vs. Validation Performance: The slightly better performance on training data (R2 = 0.7246) compared to validation (R2 = 0.6802) suggests the model generalizes well, with no significant overfitting.
Inverse Watch offers historical tracking of the model’s performance:
Prediction Metrics: Displays how metrics such as MAE for price predictions have evolved over time. This can help identify whether the model is becoming more accurate or stable with more data.
Training Metrics: Displays the mean absolute error for log price change or corresponding metrics/column during training. The relatively stable line suggests consistent performance across training iterations. Additionally the tooltip will show you what were the exact parameters used for after the training for the selected best candidate.