最近,我开始学习熊猫。我确实试图获得解决方案,但找不到它。这是问题。
我有一个数据框:简单的足球数据。对于每支球队,我想知道他们在前两场比赛中打进了多少个进球;无论他们是主队还是客队。因此,我必须为每个团队从2个不同的列中汇总特定数量的值。
样本数据:
import pandas as pd
data = [['2018-02-03', 'manutd', 'chelsea', 3, 1], ['2018-02-08', 'arsenal', 'liverpool', 1, 1],
['2018-01-12', 'chelsea', 'westham', 2, 0], ['2018-01-12', 'liverpool', 'manutd', 0, 2],
['2018-03-15', 'arsenal', 'chelsea', 2, 2], ['2018-02-20', 'manutd', 'brighton', 0, 0],
['2018-04-01', 'westham', 'fulham', 1, 0], ['2018-03-15', 'manutd', 'westham', 2, 1]]
df = pd.DataFrame(data, columns = ['event_time', 'home_team', 'away_team', 'home_goals', 'away_goals'])
df['event_time'] = pd.to_datetime(df['event_time'])
df.sort_values(['event_time'],inplace=True, ascending=False)
print(df)
event_date home_team away_team home_goals away_goals
6 2018-04-01 westham fulham 1 0
4 2018-03-15 arsenal chelsea 2 2
7 2018-03-15 manutd westham 2 1
5 2018-02-20 manutd brighton 0 0
1 2018-02-08 arsenal liverpool 1 1
0 2018-02-03 manutd chelsea 3 1
2 2018-01-12 chelsea westham 2 0
3 2018-01-12 liverpool manutd 0 2
我要实现的目标:
event_time home_team away_team home_goals away_goals h_goals_previous_2 a_goals_previous_2
6 2018-04-01 westham fulham 1 0 1 NaN
4 2018-03-15 arsenal chelsea 2 2 1 3
7 2018-03-15 manutd westham 2 1 3 0
5 2018-02-20 manutd brighton 0 0 5 NaN
1 2018-02-08 arsenal liverpool 1 1 NaN 0
0 2018-02-03 manutd chelsea 3 1 2 2
2 2018-01-12 chelsea westham 2 0 NaN NaN
3 2018-01-12 liverpool manutd 0 2 NaN NaN
描述:-在2018-03-15阿森纳与切尔西队在一起。在之前的2场比赛中,切尔西总共进球3个进球:1个在客队时,2个在主队时。-之前的一些目标是Nan,因为我们没有以前比赛的数据。
我试图通过逐个团队地迭代来做到这一点,对于每个团队,我都在构建df的排序子集,然后可以汇总这些值,但是觉得它不是最佳解决方案,可以使用漂亮的Pandas表达式来完成:
teams = pd.unique(df[['home_team', 'away_team']].values.ravel('K'))
for team in teams:
print(team)
team_df = df[(df['home_team']==team) | (df['away_team']==team)]
team_df.sort_values(['event_date'],inplace=True, ascending=False)
print(team_df)
如何在不编写循环和if的情况下做到这一点?
方法1 pd.wide_to_long
:
#Create a df2 with index like a column a rename the columns to apply:
# pd.wide_to_long
df2=df.set_index('event_time',append=True)
df2.columns=[''.join(name[::-1]) for name in df2.columns.str.split('_')]
df2.columns=df2.columns.str.replace('home','1').str.replace('away','2')
df2=df2.reset_index()
#Using pd.wide_to_long
df_long=( pd.wide_to_long(df2,['team','goals'],i='level_0',j='key')
.sort_values('event_time',ascending=False) )
print(df_long)
event_time team goals
level_0 key
6 1 2018-04-01 westham 1
2 2018-04-01 fulham 0
4 1 2018-03-15 arsenal 2
7 1 2018-03-15 manutd 2
4 2 2018-03-15 chelsea 2
7 2 2018-03-15 westham 1
5 1 2018-02-20 manutd 0
2 2018-02-20 brighton 0
1 1 2018-02-08 arsenal 1
2 2018-02-08 liverpool 1
0 1 2018-02-03 manutd 3
2 2018-02-03 chelsea 1
2 1 2018-01-12 chelsea 2
3 1 2018-01-12 liverpool 0
2 2 2018-01-12 westham 0
3 2 2018-01-12 manutd 2
#calculating the sum
groups_goals=df_long.groupby('team')['goals']
df_long=df_long.assign(value_2_sum=groups_goals.shift(-1)+groups_goals.shift(-2))
#Getting goals previous columns
goals_previous=df_long.pivot_table(index='level_0',columns='key',values='value_2_sum',dropna=False)
df[['h_goals_previous_2', 'a_goals_previous_2']]=goals_previous
print(df)
方法2: DataFrame.melt
cols=['h_goals_previous_2', 'a_goals_previous_2']
df2=( df.reset_index()
.melt(['event_time','home_team','away_team','index'])
.sort_values('event_time',ascending=False) )
df2['team']=df2['home_team'].mask(df2['variable'].eq('away_goals'),df2['away_team'])
groups_goals=df2.groupby('team')['value']
df2['value_2']=groups_goals.shift(-2)+groups_goals.shift(-1)
df[cols]=( df2.pivot_table(columns='variable',index='index',values='value_2',dropna=False)
.sort_index(axis=1,ascending=False) )
print(df)
输出:
event_time home_team away_team home_goals away_goals \
6 2018-04-01 westham fulham 1 0
4 2018-03-15 arsenal chelsea 2 2
7 2018-03-15 manutd westham 2 1
5 2018-02-20 manutd brighton 0 0
1 2018-02-08 arsenal liverpool 1 1
0 2018-02-03 manutd chelsea 3 1
2 2018-01-12 chelsea westham 2 0
3 2018-01-12 liverpool manutd 0 2
h_goals_previous_2 a_goals_previous_2
6 1.0 NaN
4 NaN 3.0
7 3.0 NaN
5 5.0 NaN
1 NaN NaN
0 NaN NaN
2 NaN NaN
3 NaN NaN
请注意还有更多的NaN值,因为我只使用了数据框中显示的行
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句