I have the following Pandas dataframe df
containing minute interval stock price data:
High Low
Timestamp
2020-01-02 04:01:00 295.08 295.05
2020-01-02 04:07:00 295.59 295.35
2020-01-02 04:09:00 295.55 295.55
2020-01-02 04:10:00 295.75 295.74
2020-01-02 04:11:00 295.60 295.60
... ... ... ... ... ... ... ...
2020-08-18 19:56:00 462.98 462.98
2020-08-18 19:57:00 462.98 462.95
2020-08-18 19:58:00 462.88 462.88
2020-08-18 19:59:00 462.88 462.85
2020-08-18 20:00:00 462.85 462.80
Timestamp
is a DatetimeIndex. I've been able to get the high and low price and times of each for every trading day between 9:30am and 4pm with the following code:
# Calculate the highest High and lowest low for each trading day
daily_high_low = df.between_time('09:30','16:00', include_start=False, include_end=True).resample('D').agg({'High':'max', 'Low':'min'}).dropna()
# Add 'Date' column to df for groupby
df['Date'] = df.index.date
# Get time of Reg. Trading Hours High and Low
high_time = df[['Date','High']].between_time('09:30','16:00', include_start=False, include_end=True).groupby('Date').idxmax()
high_time.index = pd.to_datetime(high_time.index)
high_time = high_time['High'].dt.time.rename('High_Time')
low_time = df[['Date','Low']].between_time('09:30','16:00', include_start=False, include_end=True).groupby('Date').idxmin()
low_time.index = pd.to_datetime(low_time.index)
low_time = low_time['Low'].dt.time.rename('Low_Time')
Which has enabled me to generate the following dataframe:
High Low High_Time Low_Time
Timestamp
2020-01-02 300.6000 295.1900 16:00:00 09:33:00
2020-01-03 300.5800 296.5000 12:52:00 09:31:00
2020-01-06 299.9600 292.7501 14:15:00 09:31:00
2020-01-07 300.9000 297.4800 09:35:00 10:29:00
2020-01-08 304.4399 297.1560 15:42:00 09:31:00
... ... ... ... ... ... ...
2020-08-12 453.1000 441.1900 12:46:00 09:45:00
2020-08-13 464.1700 455.7100 13:01:00 10:19:00
2020-08-14 460.0000 452.1800 15:56:00 11:05:00
2020-08-17 464.3600 455.8501 09:31:00 11:47:00
2020-08-18 464.0000 456.0300 14:24:00 10:31:00
I am now trying to generate and add the following columns and am completely stuck:
L_after_H
, the lowest Low that comes after the day's High,H_after_L
, the highest High that comes after the day's Low,L_after_H_Time
, the time of the lowest Low that comes after the day's High,H_after_L_Time
, the time of the highest High that comes after the day's Low.My best attempt is something like
df[['Date', 'High', 'Low']].groupby('Date') \
.between_time(high_time,'16:00', include_start=False, include_end=True)
but that fails because 'DataFrameGroupBy' object has no attribute 'between_time'
. I would be really happy just to be able to filter the date groups to contain only timestamps > high_time.
A minor note before getting into the actual solution:
You should always provide a good sample of data for others to play around and test code alternatives. It should be easy for others to copy and paste into their code, and it does not have to be real data. In this case I used the following:
Sample Data
np.random.seed(123)
n = int(1e3)
df = pd.DataFrame(
{'High': np.random.randint(low=0, high=1000, size=n)},
index=pd.date_range(start='2020-01-01 9:00', periods=n, freq='15T')
)
df['Low'] = (df.High * 0.8).astype(int)
Now to your question
Pandas has no built-in functionality to filter rows within the groups of a groupby (AFIK), so I used your code to get daily High/Low and then grouped the original df by date and looped in order to filter the rows of each date dynamically.
Here's the code
def extrema(df_original, start_time_str, end_time_str):
# clean the data once at start
df = df_original.between_time(
start_time_str, end_time_str, include_start=False, include_end=False)
# your methods for max/min
daily = df.resample('D').agg({'High':'max', 'Low':'min'}).dropna()
daily['High_Time'] = df.groupby(df.index.date).High.idxmax().dt.time
daily['Low_Time'] = df.groupby(df.index.date).Low.idxmin().dt.time
from datetime import datetime # move to the imports section of your code
hal, lah, hal_time, lah_time = [], [], [], []
for (date, rows), (_, high, low, htime, ltime) in zip(
df.groupby(df.index.date), daily.itertuples()):
if htime > ltime:
# high_after_low == high, no need to search again
hal_time.append(htime)
hal.append(high)
# get low_after_high
if htime == rows.index[-1].time():
lah_time.append(htime)
lah.append(high)
else:
t = rows.loc[
rows.index > datetime.combine(date, htime), 'Low'].idxmin()
lah_time.append(t.time())
lah.append(rows.loc[t, 'Low'])
else:
# low_after_high == low
lah.append(low)
lah_time.append(ltime)
# get high_after_low
if ltime == rows.index[-1].time():
hal_time.append(ltime)
hal.append(low)
else:
t = rows.loc[
rows.index > datetime.combine(date, ltime), 'High'].idxmax()
hal_time.append(t.time())
hal.append(rows.loc[t, 'High'])
daily = pd.concat([daily,
pd.DataFrame(
{'High_after_Low': hal, 'Low_after_High': lah,
'High_after_Low_Time': hal_time, 'Low_after_High_Time': lah_time},
index=daily.index)
], axis=1)
return daily
result = extrema(df, '09:30', '16:00')
print(result)
Output
High Low High_Time Low_Time High_after_Low Low_after_High High_after_Low_Time Low_after_High_Time
2020-01-01 988 13 10:00:00 10:45:00 942 13 14:00:00 10:45:00
2020-01-02 987 2 12:15:00 15:00:00 907 2 15:15:00 15:00:00
2020-01-03 970 6 12:45:00 10:45:00 970 60 12:45:00 13:00:00
2020-01-04 992 8 15:30:00 10:15:00 992 224 15:30:00 15:45:00
2020-01-05 985 15 11:45:00 15:45:00 15 15 15:45:00 15:45:00
2020-01-06 994 84 10:15:00 15:45:00 84 84 15:45:00 15:45:00
2020-01-07 935 39 15:00:00 13:15:00 935 277 15:00:00 15:45:00
2020-01-08 999 39 10:00:00 15:15:00 765 39 15:45:00 15:15:00
2020-01-09 964 4 15:15:00 14:00:00 964 90 15:15:00 15:45:00
2020-01-10 968 36 10:45:00 14:30:00 967 36 15:45:00 14:30:00
2020-01-11 924 13 10:45:00 12:30:00 638 13 14:45:00 12:30:00
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments