Finding the minimum value that comes after the maximum value in a group within a Pandas dataframe

stgcp

I have the following Pandas dataframe df containing minute interval stock price data:

                    High    Low 
Timestamp                           
2020-01-02 04:01:00 295.08  295.05
2020-01-02 04:07:00 295.59  295.35
2020-01-02 04:09:00 295.55  295.55
2020-01-02 04:10:00 295.75  295.74
2020-01-02 04:11:00 295.60  295.60
... ... ... ... ... ... ... ...
2020-08-18 19:56:00 462.98  462.98
2020-08-18 19:57:00 462.98  462.95
2020-08-18 19:58:00 462.88  462.88
2020-08-18 19:59:00 462.88  462.85
2020-08-18 20:00:00 462.85  462.80

Timestamp is a DatetimeIndex. I've been able to get the high and low price and times of each for every trading day between 9:30am and 4pm with the following code:

# Calculate the highest High and lowest low for each trading day
daily_high_low = df.between_time('09:30','16:00', include_start=False, include_end=True).resample('D').agg({'High':'max', 'Low':'min'}).dropna()

# Add 'Date' column to df for groupby
df['Date'] = df.index.date

# Get time of Reg. Trading Hours High and Low
high_time = df[['Date','High']].between_time('09:30','16:00', include_start=False, include_end=True).groupby('Date').idxmax()
high_time.index = pd.to_datetime(high_time.index)
high_time = high_time['High'].dt.time.rename('High_Time')

low_time = df[['Date','Low']].between_time('09:30','16:00', include_start=False, include_end=True).groupby('Date').idxmin()
low_time.index = pd.to_datetime(low_time.index)
low_time = low_time['Low'].dt.time.rename('Low_Time')

Which has enabled me to generate the following dataframe:

            High        Low         High_Time   Low_Time
Timestamp                       
2020-01-02  300.6000    295.1900    16:00:00    09:33:00
2020-01-03  300.5800    296.5000    12:52:00    09:31:00
2020-01-06  299.9600    292.7501    14:15:00    09:31:00
2020-01-07  300.9000    297.4800    09:35:00    10:29:00
2020-01-08  304.4399    297.1560    15:42:00    09:31:00
... ... ... ... ... ... ...
2020-08-12  453.1000    441.1900    12:46:00    09:45:00
2020-08-13  464.1700    455.7100    13:01:00    10:19:00
2020-08-14  460.0000    452.1800    15:56:00    11:05:00
2020-08-17  464.3600    455.8501    09:31:00    11:47:00
2020-08-18  464.0000    456.0300    14:24:00    10:31:00

I am now trying to generate and add the following columns and am completely stuck:

  • L_after_H, the lowest Low that comes after the day's High,
  • H_after_L, the highest High that comes after the day's Low,
  • L_after_H_Time, the time of the lowest Low that comes after the day's High,
  • H_after_L_Time, the time of the highest High that comes after the day's Low.

My best attempt is something like

df[['Date', 'High', 'Low']].groupby('Date') \
.between_time(high_time,'16:00', include_start=False, include_end=True)

but that fails because 'DataFrameGroupBy' object has no attribute 'between_time'. I would be really happy just to be able to filter the date groups to contain only timestamps > high_time.

RichieV

A minor note before getting into the actual solution:

You should always provide a good sample of data for others to play around and test code alternatives. It should be easy for others to copy and paste into their code, and it does not have to be real data. In this case I used the following:

Sample Data

np.random.seed(123)
n = int(1e3)
df = pd.DataFrame(
    {'High': np.random.randint(low=0, high=1000, size=n)},
    index=pd.date_range(start='2020-01-01 9:00', periods=n, freq='15T')
)
df['Low'] = (df.High * 0.8).astype(int)

Now to your question

Pandas has no built-in functionality to filter rows within the groups of a groupby (AFIK), so I used your code to get daily High/Low and then grouped the original df by date and looped in order to filter the rows of each date dynamically.

Here's the code

def extrema(df_original, start_time_str, end_time_str):
    # clean the data once at start
    df = df_original.between_time(
        start_time_str, end_time_str, include_start=False, include_end=False)
    
    # your methods for max/min
    daily = df.resample('D').agg({'High':'max', 'Low':'min'}).dropna()
    daily['High_Time'] = df.groupby(df.index.date).High.idxmax().dt.time
    daily['Low_Time'] = df.groupby(df.index.date).Low.idxmin().dt.time
    
    from datetime import datetime # move to the imports section of your code
    
    hal, lah, hal_time, lah_time = [], [], [], []
    for (date, rows), (_, high, low, htime, ltime) in zip(
            df.groupby(df.index.date), daily.itertuples()):
        if htime > ltime:
            # high_after_low == high, no need to search again
            hal_time.append(htime)
            hal.append(high)
            # get low_after_high
            if htime == rows.index[-1].time():
                lah_time.append(htime)
                lah.append(high)
            else:
                t = rows.loc[
                    rows.index > datetime.combine(date, htime), 'Low'].idxmin()
                lah_time.append(t.time())
                lah.append(rows.loc[t, 'Low'])
        else:
            # low_after_high == low
            lah.append(low)
            lah_time.append(ltime)
            # get high_after_low
            if ltime == rows.index[-1].time():
                hal_time.append(ltime)
                hal.append(low)
            else:
                t = rows.loc[
                    rows.index > datetime.combine(date, ltime), 'High'].idxmax()
                hal_time.append(t.time())
                hal.append(rows.loc[t, 'High'])
    daily = pd.concat([daily,
        pd.DataFrame(
            {'High_after_Low': hal, 'Low_after_High': lah,
                'High_after_Low_Time': hal_time, 'Low_after_High_Time': lah_time},
            index=daily.index)
        ], axis=1)
    
    return daily

result = extrema(df, '09:30', '16:00')
print(result)

Output

            High  Low High_Time  Low_Time  High_after_Low  Low_after_High High_after_Low_Time Low_after_High_Time
2020-01-01   988   13  10:00:00  10:45:00             942              13            14:00:00            10:45:00
2020-01-02   987    2  12:15:00  15:00:00             907               2            15:15:00            15:00:00
2020-01-03   970    6  12:45:00  10:45:00             970              60            12:45:00            13:00:00
2020-01-04   992    8  15:30:00  10:15:00             992             224            15:30:00            15:45:00
2020-01-05   985   15  11:45:00  15:45:00              15              15            15:45:00            15:45:00
2020-01-06   994   84  10:15:00  15:45:00              84              84            15:45:00            15:45:00
2020-01-07   935   39  15:00:00  13:15:00             935             277            15:00:00            15:45:00
2020-01-08   999   39  10:00:00  15:15:00             765              39            15:45:00            15:15:00
2020-01-09   964    4  15:15:00  14:00:00             964              90            15:15:00            15:45:00
2020-01-10   968   36  10:45:00  14:30:00             967              36            15:45:00            14:30:00
2020-01-11   924   13  10:45:00  12:30:00             638              13            14:45:00            12:30:00

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

Pandas - calculate within group the maximum value for a minimum within column

Finding the minimum and maximum value within a Metal texture

Finding maximum value in group

Finding a minimum and maximum value in an array

Extract the maximum value within each group in a dataframe

Get minimum, maximum, average value within a specific group using r

Get the Minimum and Maximum value within specific date range in DataFrame

Extract row with maximum value in a group pandas dataframe

Optimizing and finding maximum value within Pandas Df by combining values of rows

Finding the maximum value in a group with differentiation

R output BOTH maximum and minimum value by group in dataframe

Pandas - take maximum/minimum of columns within group

Finding minimum, maximum value of a list in Python

Finding the minimum and maximum value of a cluster for cyclic data

Clustering data and finding minimum and maximum value of a cluster

Finding value +1 within a group

finding value in pandas dataframe

Pandas dataframe getting maximum and minimum by absolute value optimization

finding the maximum value of a column in pandas

Drop a pandas DataFrame row that comes after a row that contains a particular value

Mongo query - finding minimum value within a document

Finding an ID with the maximum value of an attribute in a group with SQL

Finding the minimum value after using zip

How to change only the maximum value of a group in pandas dataframe

Finding the minimum value based on another column in pandas

Get row value of maximum count after applying group by in pandas

Finding index of a pandas DataFrame value

Finding the index for a value in a Pandas Dataframe

Subtract the minimum value from the maximum value across each row, Python Pandas DataFrame