Data Preprocessing
Data Extraction
In this example, the data is fetched from the CoinGecko API and stored in a table named query_10
. This table contains raw JSON arrays of historical prices, market caps, and total volumes fetched from Coingecko.
array_data_nn
: This step extracts JSON arrays for prices, market caps, and volumes from thequery_10
table for further processing.A recursive CTE is then used to unpack the JSON arrays into individual rows, where each row corresponds to a specific time point with its associated price, market cap, and volume.
Feature Engineering
While some automatic feature engineering is applied in the backend (particularly for categorical features and timestamp dimensions - see Feature Engineering), it is necessary to apply custom preprocessing for time-series data, particularly by generating lagged values and rolling sums.
Lagged values and rolling sums are computed to capture trends and changes over time, which are crucial for time series analysis and prediction.
Example for generating lagged values and rolling sums:
Lag Functions: Calculate previous values of prices, market caps and volumes. These lag features help capture the previous states of the market.
Rolling Sums: Compute the sum of volumes over the past 7 and 30 days. These features help capture the trend in trading volume.
Final Feature Selection
The final feature set includes price changes, momentum indicators, and moving averages, which are all essential for predictive models in financial time series.
Price Changes: Absolute, relative, and logarithmic changes in price and market cap.
Momentum Indicators: Price and volume momentum over 7 days.
Moving Averages: Compute the 5-day moving averages for both price and volume, which are standard features in financial models.
Targets: Include both the current price and the logarithmic price change (log_price_change) as targets for prediction.
Logarithmic Price Changes
Logarithmic price changes (log_price_change
) are often preferred in financial modeling over prices because they provide a better statistical representation. Logarithms stabilize variance and make the data more normally distributed, which is beneficial for many statistical models.
They also allow for the interpretation of changes in percentage terms, which is intuitive for financial analysis.
Last updated