Insights

How to Get Stock Data With a REST API in Python

Alphanume Team · June 10, 2026

Auth, pagination, rate limits, and a reusable client — end to end.

Every quant project eventually needs a clean feed of price history, fundamentals, or factor data, and writing that plumbing from scratch is tedious at best and brittle at worst. A stock market api python developers can call in three lines removes the boilerplate and lets you focus on the analysis. This tutorial covers the Alphanume REST API from first request to production-ready client: authenticating with an API key, building a reusable helper, handling rate limits with exponential backoff, and caching responses to disk so you never pull the same data twice. All available datasets follow the same envelope shape, so every pattern here generalises across endpoints.

Environment setup and authentication

Install the two libraries you need and store your key in the shell environment rather than hard-coding it. A key in source code is a key in version control, which means a key on the internet.

import os
import time
import json
import pathlib
import requests
import pandas as pd

BASE_URL = "https://api.alphanume.com/v1"
API_KEY = os.environ["ALPHANUME_API_KEY"]

Every endpoint accepts the key either as a api_key query parameter or as an X-API-Key header. The query-parameter form is simpler for quick scripts; the header form keeps the key out of server logs and is better practice for anything that runs in production. Both are supported — pick one and stick with it across your codebase.

A reusable get_data() helper

Rather than repeating the same requests.get call everywhere, wrap it once. The helper attaches authentication, calls raise_for_status() to fail loudly on 4xx and 5xx responses, and unwraps the data envelope so callers receive a plain list of row dictionaries they can drop straight into a DataFrame.

def get_data(endpoint, use_header_auth=False, **params):
    headers = {}
    if use_header_auth:
        headers["X-API-Key"] = API_KEY
    else:
        params["api_key"] = API_KEY

    resp = requests.get(
        f"{BASE_URL}/{endpoint}",
        params=params,
        headers=headers,
        timeout=30,
    )
    resp.raise_for_status()
    payload = resp.json()
    return payload["data"]


def to_df(endpoint, **params):
    rows = get_data(endpoint, **params)
    return pd.DataFrame(rows)

The raise_for_status() call is not optional. A silent 401 returns an error body that pandas will parse into a DataFrame of garbage. Failing loudly means you see the real problem immediately. The count field in the envelope is a useful sanity check — compare it to len(rows) before you trust the result.

Turning the response envelope into a DataFrame

The JSON envelope always has the shape {"count": N, "data": [...]}. Once you have the data list, pd.DataFrame(rows) does the rest. All field names are snake_case, so column names arrive clean with no renaming required.

df = to_df("historical-market-cap", ticker="AAPL", date="2024-06-28")
print(df[["date", "ticker", "market_cap", "shares_outstanding"]])

regime = to_df("sp500-risk-regime", date="2024-06-28")
print(regime[["date", "regime", "probability"]])

Both calls share identical plumbing — only the endpoint slug and the filter parameters change. That is the point of a shared helper: one place to update auth, timeout, and error handling for every dataset you pull.

Handling HTTP 429 and the 600 req/min Pro rate limit

The Pro tier allows 600 requests per minute. If you exceed that — during a bulk backfill, say — the API returns HTTP 429. Simple exponential backoff handles this correctly without hammering the server further. Wait, double the delay, and retry up to a fixed maximum number of attempts.

MAX_RETRIES = 5
BASE_BACKOFF = 1.0  # seconds

def get_data_with_retry(endpoint, **params):
    params.setdefault("api_key", API_KEY)
    delay = BASE_BACKOFF

    for attempt in range(MAX_RETRIES):
        resp = requests.get(
            f"{BASE_URL}/{endpoint}",
            params=params,
            timeout=30,
        )
        if resp.status_code == 429:
            if attempt == MAX_RETRIES - 1:
                resp.raise_for_status()
            time.sleep(delay)
            delay *= 2
            continue
        resp.raise_for_status()
        return resp.json()["data"]

Starting the backoff at one second and doubling each attempt means the fifth retry fires after a total wait of roughly 15 seconds — long enough for a rate-limit window to reset without sitting idle for minutes. If you are running many parallel workers, divide 600 by the worker count to find a safe per-worker request rate and add a time.sleep(60 / rate) between calls.

Batching many tickers efficiently

Pulling a universe of 500 names one request at a time against the 600 req/min limit takes under a minute, but the per-call overhead adds up. Loop over tickers, collect results, and concatenate once at the end rather than appending to a growing DataFrame on every iteration — a DataFrame append inside a loop is an O(n²) operation in disguise.

def batch_fetch(endpoint, tickers, **params):
    frames = []
    for ticker in tickers:
        rows = get_data_with_retry(endpoint, ticker=ticker, **params)
        if rows:
            frames.append(pd.DataFrame(rows))
        # stay well inside 600 req/min
        time.sleep(0.12)
    if not frames:
        return pd.DataFrame()
    return pd.concat(frames, ignore_index=True)


universe = ["AAPL", "MSFT", "NVDA", "GOOGL", "META"]
caps = batch_fetch("historical-market-cap", universe, date="2024-06-28")
print(caps.sort_values("market_cap", ascending=False))

The 0.12-second sleep between calls keeps throughput at roughly 500 req/min — comfortably under the Pro ceiling while leaving headroom for retries. Adjust it downward if you are the only process running; tighten it if several scripts share the same key.

Caching responses to disk

Re-pulling identical date-range data on every script run wastes quota and slows iteration. A simple disk cache — keyed on the endpoint and parameters — avoids repeat pulls for data that never changes. Historical prices, filed fundamentals, and past regime labels are immutable; cache them indefinitely. Live or near-live data should not be cached, or should expire quickly.

CACHE_DIR = pathlib.Path(".cache/alphanume")
CACHE_DIR.mkdir(parents=True, exist_ok=True)

def cached_get(endpoint, **params):
    key = endpoint + "_" + "_".join(
        f"{k}-{v}" for k, v in sorted(params.items())
    )
    path = CACHE_DIR / f"{key}.json"
    if path.exists():
        return json.loads(path.read_text())
    rows = get_data_with_retry(endpoint, **params)
    path.write_text(json.dumps(rows))
    return rows


rows = cached_get("historical-market-cap", ticker="AAPL", date="2024-06-28")
df = pd.DataFrame(rows)

The cache key is built from sorted parameters so that ticker=AAPL&date=2024-06-28 and date=2024-06-28&ticker=AAPL map to the same file. Delete the .cache/ directory to force a fresh pull, or check path.stat().st_mtime against a max-age threshold if you want time-based expiry.

Where to go from here

The patterns above — a shared helper, retry with backoff, batch loops, and a disk cache — cover the majority of what a real data-pull script needs. From here you can compose them: use cached_get inside batch_fetch, pass use_header_auth=True to the base helper in production, and build dataset-specific wrappers on top. The full field schemas and available filter parameters for every endpoint are in the API documentation. If you are still evaluating providers, the comparison of the best market data APIs for algorithmic trading covers latency, licensing, and cost before you commit to a vendor.