Insights

Volume as a Predictor of Volatility

Alphanume Team · June 4, 2026

Does today's volume forecast tomorrow's range?

Trading volume and price volatility move together with a consistency that has attracted researchers since at least the 1970s. On days when shares change hands at a high rate, price swings tend to be wider; on quiet days, ranges compress. The relationship is real and robust across asset classes and time periods. The more contested question — and the one worth spending time on if you are building a forecasting model — is whether that contemporaneous correlation translates into genuine predictive power. Does yesterday's volume tell you anything useful about today's range, after accounting for what volatility itself was already telling you? The answer is yes, but modestly, conditionally, and with a number of important caveats. Volume volatility prediction is a documented phenomenon, not a myth, but it is also not a clean edge. Understanding why requires pulling apart the mechanism, the measurement choices, and the failure modes.

A practical anchor for this kind of analysis is a dataset that already captures attention-driven price activity at the individual stock level. Alphanume's Next-Day Movers dataset links daily volume, range, and gap data for stocks exhibiting elevated activity — which makes it a natural starting point for anyone studying how volume relates to subsequent price behavior.

The mixture-of-distributions hypothesis

The most coherent theoretical explanation for the volume–volatility link is the mixture-of-distributions hypothesis, developed in the 1970s and refined since. The core argument is straightforward: both volume and price variance are jointly driven by a third variable — the rate of information arrival into the market. When new information arrives — an earnings report, a macro surprise, a regulatory filing, an analyst revision — traders update their beliefs and trade to rebalance positions. That process generates both volume (the trading activity) and volatility (the price moves as bids and offers shift to reflect new valuations). Volume and volatility are not causing each other; they are both effects of the same underlying stimulus.

This framing has two important implications. First, the contemporaneous correlation between volume and volatility is not spurious — it reflects a genuine economic mechanism. Second, if you want to use volume to predict future volatility, you need a reason to believe that today's information arrival rate predicts tomorrow's. Sometimes it does: news events have sequels, earnings surprises prompt analyst revisions, regulatory actions extend across multiple trading sessions. The predictive case for volume rests on this persistence, not on volume itself having any mechanical effect on future prices.

Abnormal volume, not raw volume

Raw share count is a poor input to any volatility model. Stocks have wildly different average trading volumes — a mega-cap index constituent trades hundreds of millions of shares daily while a micro-cap might trade tens of thousands. Market-wide volume also follows a persistent secular trend and has strong day-of-week and month-of-year seasonality. A volume reading in isolation tells you almost nothing without knowing what is normal for that stock at that time.

The useful construct is abnormal or relative volume: today's volume divided by the rolling average over a trailing window (typically 20 to 60 trading days). A ratio above 2× means the stock is trading at double its recent pace; a ratio above 5× flags something materially unusual. This normalization converts an incomparable raw count into a signal that is comparable across stocks and across time for the same stock. It also controls, at least partially, for the intraday U-shape — the well-documented pattern in which volume is highest in the opening and closing hours and lowest near midday. When you compute abnormal volume using a baseline that incorporates that intraday distribution, the ratio strips out the predictable seasonal component and leaves the residual that is more likely to carry information content.

Dollar volume rather than share volume is usually preferable for multi-stock comparisons. A $5 stock trading one million shares is a very different market event than a $500 stock trading one million shares. Dollar volume scales by price and normalizes across the liquidity spectrum. For within-stock time-series analysis, share volume ratios are generally adequate, but when running cross-sectional regressions that include both penny stocks and large-caps, dollar-denominated measures prevent the share count from dominating the signal in low-priced names.

Contemporaneous correlation versus genuine forecasting

Much of the academic literature on volume and volatility documents the contemporaneous relationship — on the same day, high volume accompanies high volatility. That finding is robust but not directly useful to a trader who needs to form a view on tomorrow's range. The predictive literature — does today's volume predict tomorrow's volatility, controlling for yesterday's volatility — is more mixed and more nuanced.

The standard baseline for any volatility forecast is a GARCH-family model, which uses the history of returns and past variances to produce a forward estimate. GARCH models already capture the autocorrelation in volatility — the well-known volatility clustering phenomenon — which means adding a volume term only improves the forecast if volume carries information beyond what past squared returns already contain. Research findings here are generally positive but small: volume adds statistically significant predictive power in many studies, but the economic magnitude of that improvement — measured by forecast error reduction — tends to be modest. The incremental R-squared is real but not transformative.

A related baseline is realized volatility, computed from intraday returns at high frequency. Realized volatility is a much more precise estimate of the current volatility regime than any GARCH-implied estimate, which means the bar for volume to add incremental value is higher when realized vol is already in the model. Volume still tends to pass that bar in event-driven environments — when a catalyst arrives and both volume and realized vol spike together, the volume signal can add information about how long the elevated regime will persist.

Volume spikes, attention, and catalysts

The cases where volume most reliably carries forward-looking information are cases where volume reflects the arrival of a discrete catalyst: an earnings release, an M&A announcement, a short-seller report, a product recall, a regulatory decision. In these situations, elevated volume is not random noise — it signals that a large number of market participants are simultaneously processing new and material information. The resolution of that information processing does not always happen in a single session. Attention persists, trading activity remains elevated, and price ranges stay wide over the subsequent one to three days as the market re-equilibrates.

This is also why volume spikes are closely linked to the what causes a stock to gap question. Gaps — overnight price discontinuities — tend to occur when information arrives during non-trading hours and is absorbed at the open. The sessions following a large gap frequently see elevated volume and elevated intraday range, as participants who missed the initial move, hedgers, and arbitrageurs all enter simultaneously. In that setting, the prior day's volume spike is a reasonable predictor of an elevated-volatility regime the next day, not because volume causes the volatility, but because both are proxying for the same catalyst-driven information environment.

Building a simple volume-augmented volatility forecast

A practical implementation starts with a volatility baseline — either a GARCH(1,1) forecast or the 20-day realized volatility — and adds an abnormal volume term. The regression structure is simple: tomorrow's realized range (or squared return, or realized volatility) as a function of today's baseline volatility estimate and today's volume ratio relative to the trailing mean.

A few construction choices matter. First, the volume ratio should be log-transformed before entry; the distribution of volume ratios is right-skewed and log-transforming pulls it closer to normality, which reduces the influence of outliers. Second, the model should be estimated separately for different liquidity buckets. The volume–volatility relationship is stronger and more stable for mid-cap and large-cap stocks than for micro-caps, where volume itself is noisy and subject to manipulation. Thin names with sudden volume spikes may be experiencing pump dynamics rather than genuine information events, and pooling them with liquid names distorts the coefficient. Third, the model should be re-estimated on a rolling basis rather than fit once on the full history; the strength of the volume–volatility relationship is regime-dependent and drifts over time.

What not to expect: a model with volume as the primary predictor that dramatically outperforms pure volatility models. The academic consensus, and the practitioner experience, is that volume adds incremental value — perhaps reducing out-of-sample forecast error by a few percent relative to a GARCH baseline — but does not replace the baseline. Treat it as a conditioning variable that adjusts the volatility estimate upward when abnormal volume is present, not as a standalone signal.

Pitfalls and data-discipline requirements

Several failure modes are worth naming explicitly. Volume's own intraday seasonality — the U-shaped pattern concentrated in the first and last hour of trading — means that partial-day volume figures are not comparable to full-day figures without explicit adjustment. If you are computing abnormal volume from intraday data, the baseline must account for the time of day, or early-session volume spikes will consistently appear anomalous when they are simply normal open-auction activity.

Halts and auctions create measurement artifacts. A stock that is halted for a news pending halt accumulates a queue of orders and then reopens with a burst of volume at the reopen price. That burst is not a signal of ongoing information arrival — it is a mechanical clearing of a queue. Models that treat halt-reopen volume as normal elevated volume will draw spurious inferences. A clean data pipeline flags halts and excludes or separately handles the reopening period.

Illiquidity is the most important confound for small-cap work. In a stock that trades 50,000 shares on an average day, a single institutional order for 200,000 shares creates a 4× volume ratio. That ratio reflects execution, not information — and the subsequent volatility spike may reflect the market impact of that one order rather than any news event. Filtering by average dollar volume before applying a volume-based signal is a necessary step, not an optional one.

Finally, survivorship in the training data matters. If a historical dataset of volume and volatility observations excludes stocks that were delisted, halted permanently, or went through bankruptcy, the sample is not representative of the population a live model would encounter. Data discipline means working with a complete sample including delistings, and being explicit about which time periods and market conditions the model was calibrated on.

Honest assessment of predictive value

Volume adds real but modest incremental information to volatility forecasts. The mechanism is sound — both volume and volatility respond to the rate of information arrival, and when today's information flow is unusually high, there is persistence that makes tomorrow's regime somewhat predictable. Abnormal volume, normalized to a stock's own history, is the operationally correct form of the signal. The contemporaneous relationship is far stronger than the predictive one, and the predictive relationship is most reliable in event-driven, liquid, catalyst-rich environments.

The practical value is as a conditioning variable in a broader volatility model: when abnormal volume is high, shade the volatility forecast upward from the GARCH or realized-vol baseline. When volume is near normal, rely primarily on the volatility history. Volume does not replace disciplined volatility modeling — it refines it, in specific regimes, by a modest amount. That is a useful thing, but only if the measurement is done carefully and the expectations are calibrated honestly.