Consider the minimal example
customer day purchase
Joe 1 5
Joe 1 10
Joe 2 5
Joe 2 5
Joe 4 10
Joe 7 5
In BigQuery, one would do something similar to this to get how much the customer spent in the last 2 days for every day:
SELECT customer, day
, sum(purchase) OVER (PARTITION BY customer ORDER BY day ASC RANGE between 2 preceding and 1 preceding)
FROM table
What would be the equivalent in pandas? i.e., expected outcome
customer day purchase amount_last_2d
Joe 1 5 null -- spent days [-,-]
Joe 1 10 null -- spent days [-,-]
Joe 2 5 15 -- spent days [-,1]
Joe 2 5 15 -- spent days [-,1]
Joe 4 10 10 -- spent days [2,3]
Joe 7 5 0 -- spent days [5,6]
Try groupby
with shift
then reindex
back
df['new'] = df.groupby(['customer','day']).purchase.sum().shift().reindex(pd.MultiIndex.from_frame(df[['customer','day']])).values
df
Out[259]:
customer day purchase new
0 Joe 1 5 NaN
1 Joe 1 10 NaN
2 Joe 2 10 15.0
3 Joe 2 5 15.0
4 Joe 4 10 15.0
Update
s = df.groupby(['customer','day']).apply(lambda x : df.loc[df.customer.isin(x['customer'].tolist()) & (df.day.isin(x['day']-1)|df.day.isin(x['day']-2)),'purchase'].sum())
df['new'] = s.reindex(pd.MultiIndex.from_frame(df[['customer','day']])).values
df
Out[271]:
customer day purchase new
0 Joe 1 5 0
1 Joe 1 10 0
2 Joe 2 5 15
3 Joe 2 5 15
4 Joe 4 10 10
5 Joe 7 5 0
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments