Insights
Resampling Intraday Data With Pandas
Alphanume Team · June 4, 2026
OHLCV bars at any frequency, correctly.
If you have ever tried to resample intraday data pandas-style and ended up with open prices that don't match the bar you expected, you are not alone. The mechanics of resample() look simple until you hit label alignment, non-trading gaps, and daylight saving transitions all at once. This post covers the full pattern — rolling 1-minute bars into 5-minute, hourly, and daily OHLCV aggregations — correctly, without peeking at future data or leaking garbage from overnight gaps. If you plan to feed the output into volatility calculations, pair this with our guide on computing realized volatility in Python.
What it means to resample intraday data with pandas
Resampling is a time-frequency conversion. You start with bars at one frequency — say, one row per minute — and bin them into a coarser frequency, producing one row per 5 minutes, one per hour, or one per day. The pd.DataFrame.resample() method handles this by slicing the DatetimeIndex into fixed calendar bins and then applying an aggregation function to each bin.
For OHLCV data, the aggregation rules are deterministic: open is the first price in the bin, high is the maximum, low is the minimum, close is the last, and volume is the sum. Anything else is wrong. The only complexity comes from deciding exactly which timestamps fall into which bin — and that is where label and closed become important.
Setting up the DatetimeIndex and timezone
Before you call resample(), your DataFrame needs a proper DatetimeIndex with timezone information attached. Without a timezone, pandas cannot reason correctly about trading hours, DST transitions, or the boundary between one session and the next. US equity data should be anchored to America/New_York (US/Eastern), because that is the timezone the exchanges use.
If your data arrives as UTC timestamps — common from most data vendors — convert it explicitly:
import pandas as pd
# df has a DatetimeIndex already in UTC
df.index = df.index.tz_localize("UTC") # only if tz-naive
df.index = df.index.tz_convert("America/New_York")
If the index is tz-naive and already in Eastern time, use tz_localize("America/New_York") directly, but be careful: localizing a naive index that spans a DST transition will raise an AmbiguousTimeError unless you pass ambiguous="infer" or nonexistent="shift_forward". Converting from UTC avoids this entirely and is the safer default.
Restricting to regular trading hours
Raw intraday data often includes pre-market and after-hours bars. Those bars are typically thin, wide-spread, and misleading — you almost never want them in a strategy that runs during regular hours. Strip them with between_time before resampling:
rth = df.between_time("09:30", "16:00")
This keeps every bar whose time component falls in the closed interval [09:30, 16:00] in the index's local timezone. Because the index is already in Eastern time, the boundary is expressed in exchange-local terms and correctly shifts with DST — you do not need to adjust the string when the clocks change. Strip extended hours before resampling rather than after; if you resample first, an overnight gap between 16:00 and 09:30 will produce empty bins that pad your OHLCV series with NaN rows.
The resample + agg pattern
With a clean, tz-aware, trading-hours-only index, the core aggregation is straightforward:
agg_rules = {
"open": "first",
"high": "max",
"low": "min",
"close": "last",
"volume": "sum",
}
bars_5m = (
rth
.resample("5min", label="right", closed="right")
.agg(agg_rules)
.dropna(subset=["close"])
)
The two keyword arguments — label="right" and closed="right" — are not cosmetic. Together they define which timestamps belong to each bin and what label the bin carries. With closed="right", the bin includes its right edge and excludes its left edge: a bin labelled 09:35 contains the bars at 09:31, 09:32, 09:33, 09:34, and 09:35. With label="right", the bin is stamped with its closing timestamp, which is also the moment at which you first have complete information about it. Using label="left" instead would stamp each bar with the opening timestamp of the bin — that is fine for logging, but if you use that timestamp to place a trade you are implicitly assuming you can act before the bar closes, which is look-ahead bias.
The dropna(subset=["close"]) at the end removes bins that have no data at all — every empty 5-minute window that falls inside a session gap or during a half-day gets dropped cleanly. Dropping on close is the right choice because a bar with no trades has no close price; open, high, and low on an empty bin are equally meaningless.
Worked example: 1-minute to hourly bars
Here is a full, self-contained example that synthesizes a week of 1-minute data and rolls it to hourly OHLCV:
import numpy as np
import pandas as pd
rng = np.random.default_rng(42)
# Build a synthetic 1-minute OHLCV DataFrame for one week
idx = pd.date_range(
"2024-01-02 09:30",
"2024-01-05 16:00",
freq="1min",
tz="America/New_York",
)
n = len(idx)
close = 100 + np.cumsum(rng.standard_normal(n) * 0.1)
df = pd.DataFrame(
{
"open": close - rng.uniform(0, 0.05, n),
"high": close + rng.uniform(0, 0.10, n),
"low": close - rng.uniform(0, 0.10, n),
"close": close,
"volume": rng.integers(100, 1000, n),
},
index=idx,
)
# Restrict to regular trading hours
rth = df.between_time("09:30", "16:00")
# Aggregate to hourly OHLCV
agg_rules = {
"open": "first",
"high": "max",
"low": "min",
"close": "last",
"volume": "sum",
}
hourly = (
rth
.resample("1h", label="right", closed="right")
.agg(agg_rules)
.dropna(subset=["close"])
)
print(hourly.head(10))
The resulting DataFrame has one row per completed trading hour, labelled with the timestamp at which each hour closes. Checking hourly.index.tz should return America/New_York, confirming the timezone was preserved through the operation.
resample vs rolling: choosing the right tool
People sometimes reach for rolling() when they mean resample(). The distinction matters. resample() creates fixed calendar bins — each bin starts and ends at a deterministic wall-clock time and all rows in the same bin share one output row. rolling() creates a sliding window — it moves one step at a time and every input row produces its own output row. Use resample() when you need to change bar frequency; use rolling() when you need a trailing statistic at the original frequency. For example, a 20-bar rolling average of hourly closes should use rolling(20).mean() on the hourly DataFrame produced by resample() — not rolling() on the 1-minute data. For the rolling case, see our post on calculating rolling returns in pandas.
Gotchas: half-days, DST, and mixed sessions
A handful of edge cases will burn you if you are not watching for them. On half-days — the Friday after Thanksgiving, Christmas Eve when it falls on a weekday — the exchange closes at 13:00 Eastern. between_time("09:30", "16:00") happily keeps those bars, but your last hour bin for that session will be thin: it may contain only 30 minutes of data. If that matters for your strategy, add a session calendar that flags half-days explicitly.
DST transitions shift the UTC offset by one hour but leave the Eastern clock time unchanged, which is why converting from UTC before calling between_time is safer than slicing first and converting later. The one remaining trap is a dataset that mixes sessions from different exchanges — for instance, US equities alongside European ADRs. Their active windows differ, so a single between_time call will silently blank out the non-US session. Filter each asset in isolation before concatenating.
Finally, watch your daily aggregation. If you resample all the way to "1D", pandas will bin by calendar day in the index timezone. That is correct for Eastern-timezone data, but only if overnight and extended-hours bars were stripped first — otherwise a 20:00 print on Monday night ends up in Tuesday's daily bar. Strip to RTH, then resample daily, and the open of each daily bar will correctly match the 09:30 1-minute open. For endpoint details and available fields, see the API documentation.