Insights
How to Build a Dilution Screener in Python
Alphanume Team · June 4, 2026
Querying offering events into a ranked short watchlist.
A dilution screener python traders actually use is not a list of every company that ever filed an S-3. It is a ranked triage tool — one that separates names with fresh, aggressive issuance from those with stale or immaterial events. This tutorial walks from a raw API pull of the Stock Dilution dataset all the way to a sorted watchlist DataFrame, with explicit point-in-time discipline so nothing in your screen looks backward from today's knowledge. We will engineer three feature families, combine them with a handful of hard filters, and score each name with a normalized rank. The result tells you which names deserve a closer look, not which ones to short.
The setup
Two libraries cover the full workflow: requests for the API calls and pandas for every transformation. The key lives in an environment variable. Store it there and reference it via os.environ — never hard-code credentials in a script you might share or check in. The helper below is the single call site for every endpoint used in this tutorial.
import os
import requests
import pandas as pd
from datetime import date, timedelta
BASE_URL = "https://api.alphanume.com/v1"
API_KEY = os.environ["ALPHANUME_API_KEY"]
def get_data(endpoint, **params):
params["api_key"] = API_KEY
resp = requests.get(f"{BASE_URL}/{endpoint}", params=params, timeout=30)
resp.raise_for_status()
return resp.json()["data"]
Calling raise_for_status() immediately surfaces auth errors and rate limits rather than letting a malformed response silently turn into a screener built on empty data. The ["data"] slice unwraps the response envelope; the sibling "count" field can double-check that your parameter filters returned rows at all.
Pulling recent dilution events
The dilution endpoint returns offering and registration events. You pass a lookback window and get back one row per event — ticker, event type, date, and size metadata. Pull the trailing ninety days and load everything into a single DataFrame before you touch a single feature.
def load_dilution_events(as_of: str, lookback_days: int = 90) -> pd.DataFrame:
"""
Pull offering/registration events visible as of `as_of`.
Only events on or before as_of are included — no look-ahead.
"""
start = (
date.fromisoformat(as_of) - timedelta(days=lookback_days)
).isoformat()
rows = get_data("dilution", start_date=start, end_date=as_of)
if not rows:
return pd.DataFrame()
df = pd.DataFrame(rows)
df["event_date"] = pd.to_datetime(df["event_date"])
df["as_of"] = pd.to_datetime(as_of)
return df
events = load_dilution_events(as_of="2026-06-10", lookback_days=90)
print(events.head())
print(f"{events['ticker'].nunique()} unique tickers, {len(events)} events")
Passing end_date=as_of is not optional ceremony — it is the mechanism that keeps your screen point-in-time. An event filed tomorrow is not knowable today, and including it would make a backtest look far better than any live run ever could. Fix the as-of date at the top of every run and pass it through every API call.
Engineering screening features
Raw event rows are not scores. You need to collapse per-event rows into one row per ticker and then compute three feature families: recency, frequency, and event-type weight.
Recency measures how recently the most damaging event landed. An offering filed yesterday is categorically different from one filed eighty days ago, and that difference should drive rank. Compute it as days since last event, then invert it so higher is worse — consistent with the other features.
Frequency counts how many events appeared in the full ninety-day window and again in the most recent thirty days. A name that files three times in a month is a different animal from one with a single registration statement.
Event-type weight distinguishes a bought deal or ATM program — where shares actually hit the float — from a shelf registration that has not been drawn yet. Assign a numeric weight per event type and sum it per ticker.
EVENT_WEIGHTS = {
"bought_deal": 3,
"atm_offering": 3,
"direct_offering": 3,
"registered_direct": 2,
"shelf_registration": 1,
"s1_registration": 1,
}
def engineer_features(df: pd.DataFrame, as_of: str) -> pd.DataFrame:
as_of_dt = pd.to_datetime(as_of)
cutoff_30 = as_of_dt - pd.Timedelta(days=30)
df = df.copy()
df["weight"] = df["event_type"].map(EVENT_WEIGHTS).fillna(1)
df["days_ago"] = (as_of_dt - df["event_date"]).dt.days
grp = df.groupby("ticker")
feats = pd.DataFrame({
"last_event_days_ago": grp["days_ago"].min(),
"event_count_90d": grp["event_date"].count(),
"event_count_30d": grp["event_date"].apply(
lambda s: (s >= cutoff_30).sum()
),
"weight_sum": grp["weight"].sum(),
})
# Invert recency so larger = worse across all features
feats["recency_score"] = 1 / (feats["last_event_days_ago"] + 1)
return feats
feats = engineer_features(events, as_of="2026-06-10")
print(feats.sort_values("recency_score", ascending=False).head(10))
Keep every intermediate column in the DataFrame during development. You will want to audit why a name ranked where it did, and opaque single-column output makes that nearly impossible.
Hard filters and normalization
Scoring before filtering is a waste — a micro-cap with two shares outstanding can outscore a genuine short candidate on raw counts. Apply hard filters first to drop names that are structurally unsuitable, then normalize only the survivors.
The hard filters here are conceptual placeholders; you would join in price, float, and average volume from your own data source. The point is the ordering: filter, then score. Never reverse that.
def apply_hard_filters(
feats: pd.DataFrame,
min_price: float = 1.0,
min_avg_volume: int = 500_000,
universe: pd.Index | None = None,
) -> pd.DataFrame:
"""
Drop names that fail structural filters before scoring.
`universe` is an optional pre-filtered index from your
price/float/liquidity data source.
"""
out = feats.copy()
if universe is not None:
out = out.loc[out.index.intersection(universe)]
# Require at least one event in the last 30 days
out = out[out["event_count_30d"] >= 1]
return out
def normalize_and_rank(feats: pd.DataFrame) -> pd.DataFrame:
score_cols = ["recency_score", "event_count_90d",
"event_count_30d", "weight_sum"]
normed = feats[score_cols].copy()
for col in score_cols:
col_min = normed[col].min()
col_max = normed[col].max()
rng = col_max - col_min
normed[col] = (normed[col] - col_min) / rng if rng > 0 else 0.0
feats = feats.copy()
feats["dilution_score"] = normed.mean(axis=1)
return feats.sort_values("dilution_score", ascending=False)
filtered = apply_hard_filters(feats)
watchlist = normalize_and_rank(filtered)
print(watchlist[["dilution_score", "last_event_days_ago",
"event_count_30d", "weight_sum"]].head(15))
Equal-weighting the four normalized columns is a deliberate starting point, not a claim that each deserves identical importance. Adjust the weights once you have compared the output against known dilution episodes and can see where the ranking diverges from your judgment.
Wrapping it into one function
Production code should be callable with two arguments — an as-of date and a universe — and return a clean watchlist in one shot. Composing the steps above into build_dilution_screen makes that possible.
def build_dilution_screen(
as_of: str = date.today().isoformat(),
lookback_days: int = 90,
universe: pd.Index | None = None,
top_n: int = 20,
) -> pd.DataFrame:
events = load_dilution_events(as_of=as_of, lookback_days=lookback_days)
if events.empty:
return pd.DataFrame()
feats = engineer_features(events, as_of=as_of)
filtered = apply_hard_filters(feats, universe=universe)
watchlist = normalize_and_rank(filtered)
output_cols = [
"dilution_score",
"last_event_days_ago",
"event_count_30d",
"event_count_90d",
"weight_sum",
]
return watchlist[output_cols].head(top_n)
screen = build_dilution_screen(as_of="2026-06-10", top_n=20)
print(screen.to_string())
Because as_of defaults to today, you can run this daily in a cron job without changing the call site. When you replay it against a historical date for research purposes, pass the date explicitly — and make sure the universe index you supply was also constructed as of that same date.
What this screen is and is not
The watchlist is a triage tool. It surfaces names worth investigating, not names to short. A high dilution score means a company has been issuing recently and aggressively relative to others in the screened set on the as-of date — it says nothing about the price impact, the reason for the issuance, or whether the float has already absorbed the supply.
Point-in-time discipline is what separates a research-grade screen from an overfit one. Every event in the DataFrame was known on the as-of date. Every filter threshold was fixed before the score was computed. If you are going further and backtesting a short-selling strategy in Python using this watchlist as a signal, that same discipline must extend to every other input — price, volume, float — you join in downstream. Leaking even one future-known column wrecks the test.
The scoring model is intentionally simple. Equal-weighted normalized features are transparent and auditable. Once you understand where the screen agrees and disagrees with your own view of dilution risk, you can revisit the weights, add features such as offering size relative to float, or layer in sentiment signals. Start simple, verify the output makes sense, and extend deliberately. For a deeper look at the underlying data and event taxonomy, see the guide on how to build a dilution screener before tuning the weights.