Data Preprocessing

Data Extraction

In this example, the data is fetched from the CoinGecko API and stored in a table named query_10. This table contains raw JSON arrays of historical prices, market caps, and total volumes fetched from Coingecko.

WITH RECURSIVE
    array_data_nn AS (
        SELECT prices AS prices_array,
               market_caps AS market_caps_array,
               total_volumes AS total_volumes_array
        FROM query_10
    ),
  • array_data_nn: This step extracts JSON arrays for prices, market caps, and volumes from the query_10 table for further processing.

  • A recursive CTE is then used to unpack the JSON arrays into individual rows, where each row corresponds to a specific time point with its associated price, market cap, and volume.

Feature Engineering

Lagged values and rolling sums are computed to capture trends and changes over time, which are crucial for time series analysis and prediction.

Example for generating lagged values and rolling sums:

lag_cte AS (
    SELECT strftime('%Y-%m-%d %H:%M', time / 1000, 'unixepoch') AS time,
               price,
               market_cap,
               volume,
               LAG(price) OVER (ORDER BY time) AS previous_price,
               LAG(price, 2) OVER (ORDER BY time) AS previous_previous_price,
               LAG(volume) OVER (ORDER BY time) AS previous_volume,
               LAG(volume, 2) OVER (ORDER BY time) AS previous_previous_volume,
               LAG(market_cap) OVER (ORDER BY time) AS previous_market_cap,
               LAG(market_cap, 2) OVER (ORDER BY time) AS previous_previous_market_cap,
               SUM(volume) OVER (ORDER BY time ROWS BETWEEN 7 PRECEDING AND 1 PRECEDING) AS previous_7_days_volume,
               SUM(volume) OVER (ORDER BY time ROWS BETWEEN 30 PRECEDING AND 1 PRECEDING) AS previous_30_days_volume
               
    FROM result_cte
)
  • Lag Functions: Calculate previous values of prices, market caps and volumes. These lag features help capture the previous states of the market.

  • Rolling Sums: Compute the sum of volumes over the past 7 and 30 days. These features help capture the trend in trading volume.

Final Feature Selection

The final feature set includes price changes, momentum indicators, and moving averages, which are all essential for predictive models in financial time series.

  • Price Changes: Absolute, relative, and logarithmic changes in price and market cap.

  • Momentum Indicators: Price and volume momentum over 7 days.

  • Moving Averages: Compute the 5-day moving averages for both price and volume, which are standard features in financial models.

  • Targets: Include both the current price and the logarithmic price change (log_price_change) as targets for prediction.

Logarithmic Price Changes

Last updated

Was this helpful?