High-frequency price-volume correlation trend factor
factor.formula
1. Calculate the Pearson correlation coefficient between the daily stock minute-level closing price and trading volume:
2. For each trading day, the correlation coefficient \( p_t \) corresponding to the 20 consecutive time windows calculated on that day is linearly regressed, with time \( t \) as the independent variable, to obtain the regression coefficient \( \beta \):
3. The regression coefficient \( \beta \) calculated daily for all stocks is standardized in the cross section, and the influence of market value and traditional price-volume factors (such as 20-day reversal, 20-day turnover rate, 20-day volatility, etc.) is eliminated to obtain the final high-frequency price-volume correlation trend factor.
In the formula:
- :
The Pearson correlation coefficient between the stock's minute closing price and minute trading volume calculated for the ( t )th time window (e.g., the ( t )th minute) within each day. This correlation coefficient measures the degree to which price and volume are in sync or divergence during that particular time window. A positive correlation means that when prices rise, trading volume tends to increase, and vice versa; a negative correlation means that when prices rise, trading volume tends to decrease, and vice versa.
- :
The regression coefficient obtained by linear regression reflects the trend and strength of the daily price-volume correlation coefficient ( p_t ) changing over time ( t ). A positive value of ( \beta ) indicates that the daily price-volume correlation tends to increase over time; a negative value of ( \beta ) indicates that the daily price-volume correlation tends to decrease over time; the larger the absolute value of ( \beta ), the more significant the trend.
- :
The error term in the regression model represents the deviation between the actual correlation coefficient ( p_t ) and the predicted value of the regression model. The existence of the error term is due to the presence of noise and random fluctuations in the actual data.
- :
The serial number of the time window, ranging from 1 to 20. For example, if minute data is used, ( t=1 ) represents the first minute, ( t=2 ) represents the second minute, and so on. It should be noted that ( t ) here refers to the time series within each day, not the time series across days. The specific division of time windows can be adjusted according to the actual data frequency and research needs.
factor.explanation
The core logic of this factor is to capture the dynamic changes in the relationship between price and volume in the market microstructure. A negative value of ( \beta ) (i.e., the smaller the PV_corr_trend), indicates that the correlation between price and volume during the day is gradually weakening, which may imply that market sentiment is gradually diverging, and price increases may not be accompanied by an effective increase in volume, and vice versa. This is generally considered to be a manifestation of the beginning of an imbalance between the forces of the long and short sides, and may indicate a potential reversal opportunity. On the other hand, a positive value of ( \beta ) (i.e., the larger the PV_corr_trend), indicates that the correlation between price and volume during the day is gradually increasing, which may imply the consistency of market sentiment, and the price and volume are simultaneously amplified or reduced, which is generally considered to be a signal of strengthening market trends. Therefore, this factor mainly uses high-frequency data to capture short-term market sentiment and microstructure characteristics by analyzing the trend of changes in the daily price-volume relationship to assist in stock selection. Generally speaking, negative trends ( ( \beta ) is negative) may have higher predictive power.