Pandas Equivalent for SQL window function and rows range

Marc

Consider the minimal example

customer   day  purchase
Joe        1       5
Joe        1      10
Joe        2       5
Joe        2       5       
Joe        4      10
Joe        7       5

In BigQuery, one would do something similar to this to get how much the customer spent in the last 2 days for every day:

SELECT customer, day
, sum(purchase) OVER (PARTITION BY customer ORDER BY day ASC RANGE between 2 preceding and 1 preceding)
FROM table

What would be the equivalent in pandas? i.e., expected outcome

customer   day  purchase    amount_last_2d
Joe        1       5             null  -- spent days [-,-]
Joe        1      10             null  -- spent days [-,-]
Joe        2       5               15  -- spent days [-,1]
Joe        2       5               15  -- spent days [-,1]
Joe        4      10               10  -- spent days [2,3]
Joe        7       5                0  -- spent days [5,6]
BENY

Try groupby with shift then reindex back

df['new'] = df.groupby(['customer','day']).purchase.sum().shift().reindex(pd.MultiIndex.from_frame(df[['customer','day']])).values
df
Out[259]: 
  customer  day  purchase   new
0      Joe    1         5   NaN
1      Joe    1        10   NaN
2      Joe    2        10  15.0
3      Joe    2         5  15.0
4      Joe    4        10  15.0

Update

s = df.groupby(['customer','day']).apply(lambda x : df.loc[df.customer.isin(x['customer'].tolist()) & (df.day.isin(x['day']-1)|df.day.isin(x['day']-2)),'purchase'].sum())
df['new'] = s.reindex(pd.MultiIndex.from_frame(df[['customer','day']])).values
df
Out[271]: 
  customer  day  purchase  new
0      Joe    1         5    0
1      Joe    1        10    0
2      Joe    2         5   15
3      Joe    2         5   15
4      Joe    4        10   10
5      Joe    7         5    0

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

What is the pandas equivalent to a sql count window function with a filter?

Pandas equivalent to SQL window functions

Spark SQL window function range boundaries with condition

Delete duplicate rows using SQL and window function

Find equivalent rows in pandas?

How to select a range of columns with pandas rolling window function?

Pandas group by window range

Is there a Pandas function to compare and group a range of rows that satisfy a value

How can I create time range grouping in window function SQL

SQL window function and (date+interval) as a border of range

Pandas: set sliding window to iterate over rows and apply a function

Droping a range of rows in pandas

Is there an R function equivalent to range in Python?

Is there a SQL Function that I can use to return a range of rows for a Primary Key?

SQL Window Function - Number of Rows since last Max

Using SQL window function to update values from next rows?

Use SQL Group By or Window Function to analyse rows in time sequence?

pandas equivalent of SQL distinct

SQL "WHERE IN" equivalent in Pandas

Pandas sql equivalent

Pandas.dataframe.query() - fetch not null rows (Pandas equivalent to SQL: "IS NOT NULL")

Equivalent function to IFERROR in SQL?

Difference between "ROWS BETWEEN" and "RANGE BETWEEN" in (Presto) window function "OVER" clause

select a range of specific rows with pandas

Apply function to a range of specific rows

Pandas DataFrame Window Function

Pandas Window Function

Equivalent to the numpy where function in pandas

Equivalent of Pandas iloc function in dplyr

TOP Ranking

HotTag

Archive