Data Preprocessing
Last updated
Was this helpful?
Last updated
Was this helpful?
In this example, the data is fetched from the CoinGecko API and stored in a table named query_10
. This table contains raw JSON arrays of historical prices, market caps, and total volumes fetched from Coingecko.
array_data_nn
: This step extracts JSON arrays for prices, market caps, and volumes from the query_10
table for further processing.
A recursive CTE is then used to unpack the JSON arrays into individual rows, where each row corresponds to a specific time point with its associated price, market cap, and volume.
While some automatic feature engineering is applied in the backend (particularly for categorical features and timestamp dimensions - see ), it is necessary to apply custom preprocessing for time-series data, particularly by generating lagged values and rolling sums.
Lagged values and rolling sums are computed to capture trends and changes over time, which are crucial for time series analysis and prediction.
Example for generating lagged values and rolling sums:
Lag Functions: Calculate previous values of prices, market caps and volumes. These lag features help capture the previous states of the market.
Rolling Sums: Compute the sum of volumes over the past 7 and 30 days. These features help capture the trend in trading volume.
The final feature set includes price changes, momentum indicators, and moving averages, which are all essential for predictive models in financial time series.
Price Changes: Absolute, relative, and logarithmic changes in price and market cap.
Momentum Indicators: Price and volume momentum over 7 days.
Moving Averages: Compute the 5-day moving averages for both price and volume, which are standard features in financial models.
Targets: Include both the current price and the logarithmic price change (log_price_change) as targets for prediction.
Logarithmic price changes (log_price_change
) are often preferred in financial modeling over prices because they provide a better statistical representation. Logarithms stabilize variance and make the data more normally distributed, which is beneficial for many statistical models.
They also allow for the interpretation of changes in percentage terms, which is intuitive for financial analysis.