我有一个三个月的销售数据集。我需要按周和按代理商分组来获得销售总数。并希望在代理人的每日标准表中进行标准划分
Agent District Agent_type Date Device
12 abc br 01/02/2020 4233
12 abc br 01/02/2020 4123
12 abc br 03/02/2020 4314
12 abc br 05/02/2020 4134
12 abc br 19/02/2020 5341
12 abc br 19/02/2020 52141
12 abc br 19/02/2020 12141
12 abc br 26/02/2020 4224
12 abc br 28/02/2020 9563
12 abc br 05/03/2020 0953
12 abc br 10/03/2020 1212
12 abc br 15/03/2020 4309
12 abc br 02/03/2020 4200
12 abc br 30/03/2020 4299
12 abc br 01/04/2020 4211
12 abc br 10/04/2020 2200
12 abc br 19/04/2020 3300
12 abc br 29/04/2020 3222
12 abc br 29/04/2020 32222
12 abc br 29/04/2020 4212
12 abc br 29/04/2020 20922
12 abc br 29/04/2020 67822
13 aaa ae 15/02/2020 22222
13 aaa ae 29/02/2020 42132
13 aaa ae 10/02/2020 89022
13 aaa ae 28/02/2020 31111
13 aaa ae 28/02/2020 31132
13 aaa ae 28/02/2020 31867
13 aaa ae 14/02/2020 91122
output
Agent District Agent_type 1st_week_feb 2nd_week_feb 3rd_week_feb ..... 4th_week_apr
12 abc br count count count count
13 aaa ae count count count count
第二输出-代理商每日标准
Agent tot_sale daily_std
12 22 2.40
13 7 1.34
您可以使用:
#convert values to datetimes
df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)
#get weeks strating by 1
week = df['Date'].dt.isocalendar().week
df['Week'] = (week - week.min() + 1)
#lowercase months
df['Month'] = df['Date'].dt.strftime('%b').str.lower()
print (df)
Agent Date Device Week Month
0 12 2020-02-01 4233 1 feb
1 12 2020-02-01 4123 1 feb
2 12 2020-02-03 4314 2 feb
3 12 2020-02-05 4134 2 feb
4 12 2020-02-19 5341 4 feb
5 12 2020-02-26 4224 5 feb
6 12 2020-02-28 9563 5 feb
7 12 2020-03-05 953 6 mar
8 12 2020-03-10 1212 7 mar
9 12 2020-03-15 4309 7 mar
10 12 2020-03-02 4200 6 mar
11 12 2020-03-30 4299 10 mar
12 12 2020-04-01 4211 10 apr
13 12 2020-04-10 2200 11 apr
14 12 2020-04-19 3300 12 apr
15 12 2020-04-29 3222 14 apr
16 13 2020-02-15 22222 3 feb
17 13 2020-02-29 42132 5 feb
18 13 2020-03-10 89022 7 mar
19 13 2020-03-28 31111 9 mar
20 13 2020-04-14 91122 12 apr
#if need count rows use crosstab
df1 = pd.crosstab(df['Agent'], [df['Week'], df['Month']])
df1.columns = df1.columns.map(lambda x: f'{x[0]}_week_{x[1]}')
print (df1)
1_week_feb 2_week_feb 3_week_feb 4_week_feb 5_week_feb 6_week_mar \
Agent
12 2 2 0 1 2 2
13 0 0 1 0 1 0
7_week_mar 9_week_mar 10_week_apr 10_week_mar 11_week_apr \
Agent
12 2 0 1 1 1
13 1 1 0 0 0
12_week_apr 14_week_apr
Agent
12 1 1
13 1 0
#if need sum Device column use pivot_table
df2 = df.pivot_table(index='Agent',
columns=['Week', 'Month'],
values='Device',
aggfunc='sum',
fill_value=0)
df2.columns = df2.columns.map(lambda x: f'{x[0]}_week_{x[1]}')
print (df2)
1_week_feb 2_week_feb 3_week_feb 4_week_feb 5_week_feb 6_week_mar \
Agent
12 8356 8448 0 5341 13787 5153
13 0 0 22222 0 42132 0
7_week_mar 9_week_mar 10_week_apr 10_week_mar 11_week_apr \
Agent
12 5521 0 4211 4299 2200
13 89022 31111 0 0 0
12_week_apr 14_week_apr
Agent
12 3300 3222
13 91122 0
编辑:谢谢@Henry Yik指出了另一种按天计数的方法:
df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)
df['Week'] = (df["Date"].dt.day-1)//7+1
df['Month'] = df['Date'].dt.strftime('%b').str.lower()
print (df)
Agent Date Device Week Month
0 12 2020-02-01 4233 1 feb
1 12 2020-02-01 4123 1 feb
2 12 2020-02-03 4314 1 feb
3 12 2020-02-05 4134 1 feb
4 12 2020-02-19 5341 3 feb
5 12 2020-02-26 4224 4 feb
6 12 2020-02-28 9563 4 feb
7 12 2020-03-05 953 1 mar
8 12 2020-03-10 1212 2 mar
9 12 2020-03-15 4309 3 mar
10 12 2020-03-02 4200 1 mar
11 12 2020-03-30 4299 5 mar
12 12 2020-04-01 4211 1 apr
13 12 2020-04-10 2200 2 apr
14 12 2020-04-19 3300 3 apr
15 12 2020-04-29 3222 5 apr
16 13 2020-02-15 22222 3 feb
17 13 2020-02-29 42132 5 feb
18 13 2020-03-10 89022 2 mar
19 13 2020-03-28 31111 4 mar
20 13 2020-04-14 91122 2 apr
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句