Inverse Watch Docs
AppLanding
  • Overview
    • Home
    • Governance
      • Proposal 7
      • Proposal 25
      • Proposal 52
      • Proposal 107
      • Proposal 147 - S1
      • Proposal 189 - S2
  • Products
    • Inverse Alerts
      • See on Twitter
    • Inverse Chatbot
      • /doc
      • /imagine
      • /data
      • /graph
    • Inverse Subgraphs
      • See inverse-subgraph on Mainnet
      • See inverse-governance-subgraph on Mainnet
    • Inverse Watch
      • Go to App
  • User Guide
    • Quickstart
    • Alerts
      • Setting Up an Alert
      • Adding New Alert Destinations
      • Customize Alert Template
      • Multiple Column Alert
    • Queries
      • Creating and Editing Queries
      • Querying Existing Query Results
      • Query Parameters
      • How to Schedule a Query
      • Favorites & Tagging
      • Query Filters
      • How To Download / Export Query Results
      • Query Snippets
    • Visualizations
      • Cohort Visualizations
      • Visualizations How-To
      • Chart Visualizations
      • Formatting Numbers in Visualizations
      • How to Make a Pivot Table
      • Funnel Visualizations
      • Table Visualization Options
      • Visualizations Types
    • Dashboards
      • Creating and Editing Dashboards
      • Favorites & Tagging
      • Sharing and Embedding Dashboards
    • Data Sources
      • CSV & Excel Files
      • Google Sheets
      • JSON (API)
      • Python
      • EVM Chain Logs
      • EVM Chain State
      • GraphQL
      • Dune API
    • Machine Learning
      • Data Engineering
      • Regressors
        • Linear Regression
        • Random Forest
        • Ada Boosting
        • Gradient Boosting
        • Neural Network (LSTM)
      • Training and Predicting
      • Metrics & Overfitting
      • Examples
        • Price Prediction
          • Data Preprocessing
          • Model Creation & Training
          • Metrics Evaluation
          • Back Testing
          • Visualizing
        • Liquidation Risk
  • Admin & Dev Guide
    • Setup
    • Redash
    • Integrations & API
    • Query Runners
    • Users
      • Adding a Profile Picture
      • Authentication Options
      • Group Management
      • Inviting Users to Use Redash
      • Permissions & Groups
    • Visualizations
  • Cheat Sheets
    • Snippets
    • Contracts
  • More
    • Deprecated Apps
    • Github : inverse-flaskbot
    • Github : inverse-subgraph
    • Github : inverse-watch
Powered by GitBook
On this page
  • Data Extraction
  • Feature Engineering
  • Final Feature Selection

Was this helpful?

  1. User Guide
  2. Machine Learning
  3. Examples
  4. Price Prediction

Data Preprocessing

PreviousPrice PredictionNextModel Creation & Training

Last updated 7 months ago

Was this helpful?

Data Extraction

In this example, the data is fetched from the CoinGecko API and stored in a table named query_10. This table contains raw JSON arrays of historical prices, market caps, and total volumes fetched from Coingecko.

WITH RECURSIVE
    array_data_nn AS (
        SELECT prices AS prices_array,
               market_caps AS market_caps_array,
               total_volumes AS total_volumes_array
        FROM query_10
    ),
  • array_data_nn: This step extracts JSON arrays for prices, market caps, and volumes from the query_10 table for further processing.

  • A recursive CTE is then used to unpack the JSON arrays into individual rows, where each row corresponds to a specific time point with its associated price, market cap, and volume.

Feature Engineering

While some automatic feature engineering is applied in the backend (particularly for categorical features and timestamp dimensions - see ), it is necessary to apply custom preprocessing for time-series data, particularly by generating lagged values and rolling sums.

Lagged values and rolling sums are computed to capture trends and changes over time, which are crucial for time series analysis and prediction.

Example for generating lagged values and rolling sums:

lag_cte AS (
    SELECT strftime('%Y-%m-%d %H:%M', time / 1000, 'unixepoch') AS time,
               price,
               market_cap,
               volume,
               LAG(price) OVER (ORDER BY time) AS previous_price,
               LAG(price, 2) OVER (ORDER BY time) AS previous_previous_price,
               LAG(volume) OVER (ORDER BY time) AS previous_volume,
               LAG(volume, 2) OVER (ORDER BY time) AS previous_previous_volume,
               LAG(market_cap) OVER (ORDER BY time) AS previous_market_cap,
               LAG(market_cap, 2) OVER (ORDER BY time) AS previous_previous_market_cap,
               SUM(volume) OVER (ORDER BY time ROWS BETWEEN 7 PRECEDING AND 1 PRECEDING) AS previous_7_days_volume,
               SUM(volume) OVER (ORDER BY time ROWS BETWEEN 30 PRECEDING AND 1 PRECEDING) AS previous_30_days_volume
               
    FROM result_cte
)
  • Lag Functions: Calculate previous values of prices, market caps and volumes. These lag features help capture the previous states of the market.

  • Rolling Sums: Compute the sum of volumes over the past 7 and 30 days. These features help capture the trend in trading volume.

Final Feature Selection

The final feature set includes price changes, momentum indicators, and moving averages, which are all essential for predictive models in financial time series.

SELECT 
    --Timestamp will be feature-engineered by the system automatically
    strftime('%s', time) AS timestamp,
    time,
    
    --Price Features
    previous_price - previous_previous_price AS previous_price_change_abs,
    (previous_price - previous_previous_price) / previous_previous_price AS previous_price_change_rel,
    ln(previous_price / previous_previous_price) AS log_previous_price_change,
    
    -- Volume features
    previous_volume AS previous_day_volume,
    previous_7_days_volume AS previous_7_days_volume,
    previous_30_days_volume AS previous_30_days_volume,
    
    -- Market cap features
    previous_market_cap,
    previous_market_cap - previous_previous_market_cap AS previous_market_cap_change_abs,
    (previous_market_cap - previous_previous_market_cap) / previous_previous_market_cap AS previous_market_cap_change_rel,
    ln(COALESCE(previous_market_cap / previous_previous_market_cap, 1)) AS log_previous_market_cap_change,
    
    -- New Daily Features
    (price - LAG(price, 7) OVER (ORDER BY time)) / LAG(price, 7) OVER (ORDER BY time) AS daily_price_momentum_7_days,
    (volume - LAG(volume, 7) OVER (ORDER BY time)) / LAG(volume, 7) OVER (ORDER BY time) AS daily_volume_momentum_7_days,
    AVG(price) OVER (ORDER BY time ROWS BETWEEN 4 PRECEDING AND CURRENT ROW) AS daily_moving_average_5_days,
    AVG(volume) OVER (ORDER BY time ROWS BETWEEN 4 PRECEDING AND CURRENT ROW) AS daily_volume_moving_average_5_days,
    
    -- Targets
    price,
    ln(price / previous_price) AS log_price_change
FROM lag_cte
WHERE previous_price_change_rel IS NOT NULL
ORDER BY 1 ASC;
  • Price Changes: Absolute, relative, and logarithmic changes in price and market cap.

  • Momentum Indicators: Price and volume momentum over 7 days.

  • Moving Averages: Compute the 5-day moving averages for both price and volume, which are standard features in financial models.

  • Targets: Include both the current price and the logarithmic price change (log_price_change) as targets for prediction.

Logarithmic Price Changes

Logarithmic price changes (log_price_change) are often preferred in financial modeling over prices because they provide a better statistical representation. Logarithms stabilize variance and make the data more normally distributed, which is beneficial for many statistical models.

They also allow for the interpretation of changes in percentage terms, which is intuitive for financial analysis.

Feature Engineering