Finding the minimum value that comes after the maximum value in a group within a Pandas dataframe

stgcp Published at Dev

stgcp

I have the following Pandas dataframe df containing minute interval stock price data:

                    High    Low 
Timestamp                           
2020-01-02 04:01:00 295.08  295.05
2020-01-02 04:07:00 295.59  295.35
2020-01-02 04:09:00 295.55  295.55
2020-01-02 04:10:00 295.75  295.74
2020-01-02 04:11:00 295.60  295.60
... ... ... ... ... ... ... ...
2020-08-18 19:56:00 462.98  462.98
2020-08-18 19:57:00 462.98  462.95
2020-08-18 19:58:00 462.88  462.88
2020-08-18 19:59:00 462.88  462.85
2020-08-18 20:00:00 462.85  462.80

Timestamp is a DatetimeIndex. I've been able to get the high and low price and times of each for every trading day between 9:30am and 4pm with the following code:

# Calculate the highest High and lowest low for each trading day
daily_high_low = df.between_time('09:30','16:00', include_start=False, include_end=True).resample('D').agg({'High':'max', 'Low':'min'}).dropna()

# Add 'Date' column to df for groupby
df['Date'] = df.index.date

# Get time of Reg. Trading Hours High and Low
high_time = df[['Date','High']].between_time('09:30','16:00', include_start=False, include_end=True).groupby('Date').idxmax()
high_time.index = pd.to_datetime(high_time.index)
high_time = high_time['High'].dt.time.rename('High_Time')

low_time = df[['Date','Low']].between_time('09:30','16:00', include_start=False, include_end=True).groupby('Date').idxmin()
low_time.index = pd.to_datetime(low_time.index)
low_time = low_time['Low'].dt.time.rename('Low_Time')

Which has enabled me to generate the following dataframe:

            High        Low         High_Time   Low_Time
Timestamp                       
2020-01-02  300.6000    295.1900    16:00:00    09:33:00
2020-01-03  300.5800    296.5000    12:52:00    09:31:00
2020-01-06  299.9600    292.7501    14:15:00    09:31:00
2020-01-07  300.9000    297.4800    09:35:00    10:29:00
2020-01-08  304.4399    297.1560    15:42:00    09:31:00
... ... ... ... ... ... ...
2020-08-12  453.1000    441.1900    12:46:00    09:45:00
2020-08-13  464.1700    455.7100    13:01:00    10:19:00
2020-08-14  460.0000    452.1800    15:56:00    11:05:00
2020-08-17  464.3600    455.8501    09:31:00    11:47:00
2020-08-18  464.0000    456.0300    14:24:00    10:31:00

I am now trying to generate and add the following columns and am completely stuck:

L_after_H, the lowest Low that comes after the day's High,
H_after_L, the highest High that comes after the day's Low,
L_after_H_Time, the time of the lowest Low that comes after the day's High,
H_after_L_Time, the time of the highest High that comes after the day's Low.

My best attempt is something like

df[['Date', 'High', 'Low']].groupby('Date') \
.between_time(high_time,'16:00', include_start=False, include_end=True)

but that fails because 'DataFrameGroupBy' object has no attribute 'between_time'. I would be really happy just to be able to filter the date groups to contain only timestamps > high_time.

RichieV

A minor note before getting into the actual solution:

You should always provide a good sample of data for others to play around and test code alternatives. It should be easy for others to copy and paste into their code, and it does not have to be real data. In this case I used the following:

Sample Data

np.random.seed(123)
n = int(1e3)
df = pd.DataFrame(
    {'High': np.random.randint(low=0, high=1000, size=n)},
    index=pd.date_range(start='2020-01-01 9:00', periods=n, freq='15T')
)
df['Low'] = (df.High * 0.8).astype(int)

Now to your question

Pandas has no built-in functionality to filter rows within the groups of a groupby (AFIK), so I used your code to get daily High/Low and then grouped the original df by date and looped in order to filter the rows of each date dynamically.

Here's the code

def extrema(df_original, start_time_str, end_time_str):
    # clean the data once at start
    df = df_original.between_time(
        start_time_str, end_time_str, include_start=False, include_end=False)
    
    # your methods for max/min
    daily = df.resample('D').agg({'High':'max', 'Low':'min'}).dropna()
    daily['High_Time'] = df.groupby(df.index.date).High.idxmax().dt.time
    daily['Low_Time'] = df.groupby(df.index.date).Low.idxmin().dt.time
    
    from datetime import datetime # move to the imports section of your code
    
    hal, lah, hal_time, lah_time = [], [], [], []
    for (date, rows), (_, high, low, htime, ltime) in zip(
            df.groupby(df.index.date), daily.itertuples()):
        if htime > ltime:
            # high_after_low == high, no need to search again
            hal_time.append(htime)
            hal.append(high)
            # get low_after_high
            if htime == rows.index[-1].time():
                lah_time.append(htime)
                lah.append(high)
            else:
                t = rows.loc[
                    rows.index > datetime.combine(date, htime), 'Low'].idxmin()
                lah_time.append(t.time())
                lah.append(rows.loc[t, 'Low'])
        else:
            # low_after_high == low
            lah.append(low)
            lah_time.append(ltime)
            # get high_after_low
            if ltime == rows.index[-1].time():
                hal_time.append(ltime)
                hal.append(low)
            else:
                t = rows.loc[
                    rows.index > datetime.combine(date, ltime), 'High'].idxmax()
                hal_time.append(t.time())
                hal.append(rows.loc[t, 'High'])
    daily = pd.concat([daily,
        pd.DataFrame(
            {'High_after_Low': hal, 'Low_after_High': lah,
                'High_after_Low_Time': hal_time, 'Low_after_High_Time': lah_time},
            index=daily.index)
        ], axis=1)
    
    return daily

result = extrema(df, '09:30', '16:00')
print(result)

Output

            High  Low High_Time  Low_Time  High_after_Low  Low_after_High High_after_Low_Time Low_after_High_Time
2020-01-01   988   13  10:00:00  10:45:00             942              13            14:00:00            10:45:00
2020-01-02   987    2  12:15:00  15:00:00             907               2            15:15:00            15:00:00
2020-01-03   970    6  12:45:00  10:45:00             970              60            12:45:00            13:00:00
2020-01-04   992    8  15:30:00  10:15:00             992             224            15:30:00            15:45:00
2020-01-05   985   15  11:45:00  15:45:00              15              15            15:45:00            15:45:00
2020-01-06   994   84  10:15:00  15:45:00              84              84            15:45:00            15:45:00
2020-01-07   935   39  15:00:00  13:15:00             935             277            15:00:00            15:45:00
2020-01-08   999   39  10:00:00  15:15:00             765              39            15:45:00            15:15:00
2020-01-09   964    4  15:15:00  14:00:00             964              90            15:15:00            15:45:00
2020-01-10   968   36  10:45:00  14:30:00             967              36            15:45:00            14:30:00
2020-01-11   924   13  10:45:00  12:30:00             638              13            14:45:00            12:30:00

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at2021-05-29

Comments

0 comments

TOP Ranking

Article

Finding the minimum value that comes after the maximum value in a group within a Pandas dataframe

Finding the minimum value that comes after the maximum value in a group within a Pandas dataframe

pump.io port in URL

How to import an asset in swift using Bundle.main.path() in a react-native native module

Failed to listen on localhost:8000 (reason: Cannot assign requested address)

Double spacing in rmarkdown pdf

SQL Server : need add a dot before two last character

C++ 16 bit grayscale gradient image from 2D array

JMeter: Why get error when try to save test plan

JWT gives JsonWebTokenError "invalid token"

How to make thrown errors visible outside of a Promise?

How to tell if iOS Today Widget is being updated in the background?

Calling Doctrine clear() with an argument is deprecated

Capybara Selenium Chrome opens About Google Chrome

How to update azerothcore-wotlk docker container

Adding Ripple Effect to RecyclerView item

mysql.connector.errors.InterfaceError: 2003: Can't connect to MySQL server on '127.0.0.1:3306' (111 Connection refused)

Error while applying filter on dataframe - PySpark

Unable to add slack to bluemix project

MyPy fails dataclass argument with optional list of objects type

How can I validate and parse phone numbers to extract their country calling code and area code?

Single Sign-On in Spring by using SAML Extension and Shibboleth

python how to create many-to-many of lists inside one list