我有一个如下的pandas数据框,显示石油产品轻柴油的最小,最大和平均销售额,我想从中生成显示5年间隔(如2010-2014、2015-2019)的石油产品的最小,最大和平均销售额的数据框,..等等,其中包括结束年份。
假设以下数据框的名称为“ lightdiesel_df”
petroleum_product year max_sale min_sale avg_sale
0 Light Diesel Oil 2014 0 0 0.0
1 Light Diesel Oil 2013 0 0 0.0
2 Light Diesel Oil 2012 258 258 258.0
3 Light Diesel Oil 2011 0 0 0.0
4 Light Diesel Oil 2010 227 227 227.0
5 Light Diesel Oil 2009 238 238 238.0
6 Light Diesel Oil 2008 377 377 377.0
7 Light Diesel Oil 2007 306 306 306.0
8 Light Diesel Oil 2006 179 179 179.0
9 Light Diesel Oil 2005 290 290 290.0
10 Light Diesel Oil 2004 88 88 88.0
11 Light Diesel Oil 2003 577 577 577.0
12 Light Diesel Oil 2002 610 610 610.0
13 Light Diesel Oil 2001 2413 2413 2413.0
14 Light Diesel Oil 2000 3416 3416 3416.0
所以,基本上我想要以下输出:
petroleum_product year min_sale max_sale avg_sale
Light Diesel Oil 2010-2014 227 258 242.5
Light Diesel Oil 2005-2009 179 377 278
Light Diesel Oil 2000-2004 88 3416 1420.8
尝试使用Grouper
传递频率(5年)和参数closed ='left'的方式,如下所示:
df2['year'] = pd.to_datetime(df2['year'], format = '%Y')
(df2.groupby(['petroleum_product', pd.Grouper(key = 'year', freq = '5Y', closed = 'left')])
.agg(
{'year': lambda x: '-'.join((str(min(x.dt.year)), str(max(x.dt.year)))),
'max_sale' : 'max',
'min_sale' : 'min',
'avg_sale' : 'mean'
}).reset_index(level= 0).reset_index(drop=True)
)
#output:
petroleum_product year max_sale min_sale avg_sale
0 Light Diesel Oil 2000-2004 3416 88 1420.8
1 Light Diesel Oil 2005-2009 377 179 278.0
2 Light Diesel Oil 2010-2014 258 0 97.0
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句