Insights
How to Pull Historical Market Cap in Python
Alphanume Team · June 10, 2026
Hitting the REST endpoint and loading point-in-time market cap into a tidy DataFrame — the right way, without look-ahead.
Reconstructing what a company was worth on a past date sounds trivial until you try it. Today's share count times today's price is not what the market saw two years ago, and most free sources quietly overwrite history with restated figures. This tutorial pulls historical market cap python users can actually trust into pandas — point-in-time, survivorship-aware, and ready for a backtest. We'll use the Historical Market Cap dataset, which returns shares outstanding and market capitalization as they were known on each historical date.
The setup
You need two libraries: requests to call the API and pandas to shape the result. Store your API key in an environment variable rather than hard-coding it — you do not want a key leaking into version control. Every Alphanume endpoint shares one request pattern: a query-parameter call against the base URL, with the rows returned under a data key.
import os
import requests
import pandas as pd
BASE_URL = "https://api.alphanume.com/v1"
API_KEY = os.environ["ALPHANUME_API_KEY"]
def get_data(endpoint, **params):
params["api_key"] = API_KEY
resp = requests.get(f"{BASE_URL}/{endpoint}", params=params, timeout=30)
resp.raise_for_status()
return resp.json()["data"]
The raise_for_status() call matters: a silent 401 or 429 returns an error body that pandas will happily parse into a DataFrame of garbage. Fail loudly instead. Pulling ["data"] unwraps the response envelope, which also carries a count field you can use as a sanity check.
Pulling one snapshot
The endpoint takes a ticker and a date and returns the market cap as it was known on that date. Load it straight into a DataFrame.
def market_cap_on(ticker, date):
rows = get_data("historical-market-cap", ticker=ticker, date=date)
return pd.DataFrame(rows)
snap = market_cap_on("AAPL", "2024-06-28")
print(snap[["date", "ticker", "market_cap", "shares_outstanding"]])
Because the value is keyed to the date you pass, you are always asking "what was knowable then" rather than pulling a restated figure. That is the whole point of a point-in-time series.
Why point-in-time matters here
The temptation is to compute market cap yourself as today's price times today's share count. That injects look-ahead bias: share counts drift with buybacks, issuance, and splits, and using today's figure for a 2021 date assumes information you could not have had. A point-in-time series stores the share count as it was reported then, so a size filter built on it reflects what was actually knowable on each date.
Building a point-in-time size filter
A common use is ranking a universe by size on a rebalance date. Query each name at that date and assemble a sorted Series — no name's rank depends on data that arrived later.
def size_on_date(tickers, date):
out = {}
for t in tickers:
rows = get_data("historical-market-cap", ticker=t, date=date)
if rows:
out[t] = float(rows[0]["market_cap"])
return pd.Series(out, name="market_cap").sort_values(ascending=False)
ranked = size_on_date(["AAPL", "MSFT", "NVDA"], "2024-06-28")
print(ranked)
Wrapping the per-ticker call in a dictionary keyed by symbol keeps the assembly readable, and sorting descending gives you the large-cap-first ordering most size filters expect.
Handling the gaps
Real data has holes. Companies report quarterly, newly listed names have short histories, and delisted tickers stop updating. Decide deliberately how to handle each case rather than letting pandas decide for you:
- Forward-fill, never back-fill. Carrying the last known value forward is point-in-time safe; filling backward leaks the future.
- Keep delisted names. Dropping tickers that no longer trade reintroduces survivorship bias into any universe you build.
- Mind the reporting lag. Shares outstanding from a 10-Q are known on the filing date, not the period-end date.
From snapshots to a time series
To chart size over time, query a schedule of dates and concatenate the snapshots into one frame indexed by date.
dates = pd.date_range("2023-01-31", "2024-12-31", freq="ME")
frames = []
for d in dates:
iso = d.strftime("%Y-%m-%d")
rows = get_data("historical-market-cap", ticker="AAPL", date=iso)
frames.append(pd.DataFrame(rows))
hist = pd.concat(frames, ignore_index=True)
hist["date"] = pd.to_datetime(hist["date"])
hist = hist.sort_values("date").set_index("date")
print(hist["market_cap"].resample("QE").last())
For the full field list and rate limits, see the API documentation. If you are still deciding which provider to standardize on, the rundown of where to find historical market cap data covers the trade-offs before you write a line of code.