我有以下数据框:
url='https://raw.githubusercontent.com/108michael/ms_thesis/master/mpl.Bspons.merge.1'
df=pd.read_csv(url, index_col=0)
df['date'] = pd.to_datetime(df['date'], format='%Y-%m-%d')
df = df.set_index(['date'])
df.head(3)
state year unemployment log_diff_unemployment id.thomas party type bills id.fec years_exp session name disposition catcode naics
date
2006-05-01 AK 2006 6.6 -0.044452 1440 Republican sen s2686-109 S2AK00010 39 109 National Cable & Telecommunications Association support C4500 81
2006-05-01 AK 2006 6.6 -0.044452 1440 Republican sen s2686-109 S2AK00010 39 109 National Cable & Telecommunications Association support C4500 517
2007-03-27 AK 2007 6.3 -0.046520 1440 Republican sen s1000-110 S2AK00010 40 110 National Treasury Employees Union support L1100 NaN
我想对定义的每个组中的账单数进行求和catcode > disposition > id.fec
。我使用以下代码:
df['billsum'] = df.groupby([pd.Grouper(level='date', freq='A'), 'catcode', \
'disposition', 'id.fec']).bills.transform('sum')
哪个返回
df.head(3)
state year unemployment log_diff_unemployment id.thomas party type bills id.fec years_exp session name disposition catcode naics billsum
date
2006-05-01 AK 2006 6.6 -0.044452 1440 Republican sen s2686-109 S2AK00010 39 109 National Cable & Telecommunications Association support C4500 81 s2686-109s2686-109
2006-05-01 AK 2006 6.6 -0.044452 1440 Republican sen s2686-109 S2AK00010 39 109 National Cable & Telecommunications Association support C4500 517 s2686-109s2686-109
2007-03-27 AK 2007 6.3 -0.046520 1440 Republican sen s1000-110 S2AK00010 40 110 National Treasury Employees Union support L1100 NaN s1000-110
该代码不返回每个组中包含的“数量”钞票,而是返回每个组中包含的所有钞票。我只想要每个组中的账单数量。有人对如何进行这项工作有想法吗?
df['billsum'] = df.groupby([pd.Grouper(level='date', freq='A'), 'catcode', \
'disposition', 'id.fec']).bills.transform('size')
print df.head(3)
state year unemployment log_diff_unemployment id.thomas \
date
2006-05-01 AK 2006.0 6.6 -0.044452 1440
2006-05-01 AK 2006.0 6.6 -0.044452 1440
2007-03-27 AK 2007.0 6.3 -0.046520 1440
party type bills id.fec years_exp session \
date
2006-05-01 Republican sen s2686-109 S2AK00010 39 109
2006-05-01 Republican sen s2686-109 S2AK00010 39 109
2007-03-27 Republican sen s1000-110 S2AK00010 40 110
name disposition \
date
2006-05-01 National Cable & Telecommunications Association support
2006-05-01 National Cable & Telecommunications Association support
2007-03-27 National Treasury Employees Union support
catcode naics billsum
date
2006-05-01 C4500 81 2
2006-05-01 C4500 517 2
2007-03-27 L1100 NaN 1
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句