Insights

How to Find Short Squeeze Candidates With Data

Alphanume Team · May 2, 2026

Combining crowding signals without overfitting — what a defensible screen actually looks like.

Short-squeeze screens are easy to build and easy to overfit. Combining short interest, borrow fees, days-to-cover, and price action produces a list of names that look like squeeze candidates in hindsight on any historical date. The harder problem — producing a screen that forward-predicts squeezes at a useful base rate — requires discipline about what's actually signal versus what's data-mined noise.

The core inputs

A defensible squeeze-candidate screen uses the following inputs:

Short interest as percent of float. The structural crowding measure. See days-to-cover for the related ratio.
Days-to-cover. Trading-liquidity-adjusted measure of overhang.
Borrow fee. Operational measure of borrow scarcity. See how borrow fees are calculated.
Borrow utilization. Lent shares as percent of lendable inventory. Less commonly available but more direct than fee.
Recent price action. Distance from key technical levels (recent highs, moving averages).
Options flow indicators. Call/put open interest ratios, short-dated near-the-money call volume.
Retail-flow proxies. Social-media mention counts (carefully), retail brokerage activity indicators.

The screen pattern

A typical screen filters and ranks:

Filter: Short interest > 20% of float AND days-to-cover > 5 AND borrow fee > 25% AND market cap > $100M AND ADV > $5M.
Rank: Weighted score combining each input's percentile within the filtered universe.
Surface: Top decile of the ranked list.

This produces ~10–30 names on a typical day. The next steps determine whether the screen is actually useful.

The overfit problem

Squeeze screens are particularly susceptible to overfit because:

Famous historical squeezes (GameStop, AMC) have outsized influence on which features look predictive.
The base rate of squeezes is low — most "candidates" never squeeze.
Survivor bias in retrospective analysis: only squeezed names are remembered.
Catalyst dependence: squeezes require a trigger, but triggers are hard to model from public data.

A screen that "would have caught GameStop" is uninformative — many screens would have. The relevant question is the forward base rate of squeezes among screen candidates compared to a control group.

Backtesting the screen

To evaluate a squeeze screen properly:

Use point-in-time data. Reconstruct screen output as known on each historical date. Short interest data has publication lag — using contemporaneous-reported figures rather than as-known figures is look-ahead bias. See what is look-ahead bias.
Include delisted names. Many heavily shorted names eventually delist. Excluding them inflates the screen's success rate. See survivorship bias.
Define "squeeze" objectively. A reasonable threshold: 50%+ return within 20 trading days of screen entry. Looser definitions produce inflated success rates.
Compute the conditional base rate. Compare screen-candidate squeeze rates to baseline universe rates. The lift is the meaningful number.
Beware look-back periods. A 2-year backtest dominated by the 2021 meme-stock era produces optimistic estimates; a 10-year backtest is more representative.

The catalyst problem

Setup conditions are insufficient; squeezes require triggers. Public-data triggers that have correlated with squeezes historically:

Positive earnings surprises in heavily shorted names.
Tactical operational news (contract wins, regulatory approvals).
Social-media coordination events.
Activist or strategic-investor disclosures.
Pure technical breakouts in HTB names.

None of these triggers is reliably predictable in advance. The best a screen can do is identify the setup conditions and accept that many candidates never trigger.

How to use a squeeze screen

Practical applications:

Watchlist construction: Maintain rolling list of squeeze-setup candidates for daily review.
Risk management: Avoid net-short exposure in screen-flagged names. For dilution-event short strategies, exclude or down-weight squeeze candidates.
Tactical longs: Long entry on screen-flagged names with confirming retail-flow or catalyst indicators.
Pairs construction: Use squeeze candidates as the long leg of paired short positions in similar names without the setup features.

Combining with dilution data

The intersection of "high squeeze risk" and "active dilution" is rare but useful:

Names with high short interest and active ATM facilities: dilution provides the structural supply that limits squeeze risk.
Names with high short interest but no near-term dilution: cleaner squeeze setups.
Names with high short interest plus completed dilution-event drift: the post-offering window is when squeeze risk is highest.

Where Alphanume fits

For dilution-event short strategies, squeeze risk is the principal tail to manage. Alphanume's Dilution Events dataset identifies structural shorting candidates; combining with squeeze-screen output produces a risk-aware implementation.

Explore the Dilution Events dataset →