我正在学习使用python和pandas,我想知道如何根据两列中的单元格值组合来自不同行的内容。在这种情况下,我想合并同一部门和同一ID内的分数。
这是我拥有的DF的简化版本:
id department name scoreA
abc sales eric 2
abc sales jack 3
abc marketing sofia 6
abc marketing anna 7
zzz sales jack 1
zzz sales eric 8
zzz marketing sofia 11
zzz marketing anna 1
这是我想要的DF:
id department totalScoreA
abc sales 5
abc marketing 13
zzz sales 9
zzz marketing 12
我还有一个后续问题。如果有两列计数,并且我想要这些计数的平均值,但是在平均那些值之前,我想将scoreB乘以2。像这样:
id department name scoreA scoreB
abc sales eric 2 10
abc sales jack 3 6
abc marketing sofia 6 8
abc marketing anna 7 10
zzz sales eric 8 10
zzz sales jack 2 10
zzz marketing sofia 11 4
zzz marketing anna 1 10
这是我想要的DF:
id department totalScoreA AverageScore((A+B*2)/2)
abc sales 5 18.5
abc marketing 13 24.5
zzz sales 10 25
zzz marketing 12 20
嘿,非常感谢您提供@jezrael的答案!第一个按预期工作!
但是对于定义第二个问题,我可能有点含糊不清。我想要的是为每个部门的每个组获取所有scoreB*2
“ s ScoreA
”的“组合”平均值。我举一个带有值的例子来阐明这一点:
由此:
id department name scoreA scoreB
zzz marketing sofia 5 4
zzz marketing anna 2
对此:
的meanAB
是(5+2+4*2)/3
(数字3来自值的计数)。那么,即使在您先前的解决方案的帮助下,我也无法做到这一点,我将如何计算它:/
id department meanA meanB meanAB
zzz marketing 3.5 4 5
首先是聚集sum
有DataFrame.groupby
:
df1 = df.groupby(['id', 'department'], as_index=False, sort=False)['scoreA'].sum()
print (df1)
id department scoreA
0 abc sales 5
1 abc marketing 13
2 zzz sales 10
3 zzz marketing 12
和第二个是第一多列scoreB
,添加scoreA
和使用GroupBy.agg
与集合函数,这里字典sum
和mean
:
df2 = (df.assign(scoreB = df['scoreB'] * 2 + df['scoreA'])
.groupby(['id', 'department'], as_index=False, sort=False)
.agg({'scoreA':'sum', 'scoreB':'mean'}))
print (df2)
id department scoreA scoreB
0 abc sales 5 18.5
1 abc marketing 13 24.5
2 zzz sales 10 25.0
3 zzz marketing 12 20.0
编辑:
print (df)
id department name scoreA scoreB
0 abc sales eric 2 10.0
1 abc sales jack 3 6.0
2 abc marketing sofia 6 8.0
3 abc marketing anna 7 10.0
4 abc marketing eric 8 10.0 <-changed data
5 zzz sales jack 2 10.0
6 zzz marketing sofia 5 4.0 <-changed data
7 zzz marketing anna 2 NaN <-changed data
Count
按DataFrame.count
功能创建新列,以获取具有排除NaN的值的数量,然后进行汇总sum
和除法mean
:
df2 = (df.assign(scoreB = df['scoreB'].mul(2).add(df['scoreA'], fill_value=0),
Count = df[['scoreA','scoreB']].count(1))
.groupby(['id', 'department'], as_index=False, sort=False)
.sum())
print (df2)
id department scoreA scoreB Count
0 abc sales 5 37.0 4
1 abc marketing 21 77.0 6
2 zzz sales 2 22.0 2
3 zzz marketing 7 15.0 3
df2['scoreB'] /= df2.pop('Count')
print (df2)
id department scoreA scoreB
0 abc sales 5 9.250000
1 abc marketing 21 12.833333
2 zzz sales 2 11.000000
3 zzz marketing 7 5.000000
详细说明:
print (df.assign(scoreB = df['scoreB'].mul(2).add(df['scoreA'], fill_value=0),
Count = df[['scoreA','scoreB']].count(1)))
id department name scoreA scoreB Count
0 abc sales eric 2 22.0 2
1 abc sales jack 3 15.0 2
2 abc marketing sofia 6 22.0 2
3 abc marketing anna 7 27.0 2
4 abc marketing eric 8 28.0 2
5 zzz sales jack 2 22.0 2
6 zzz marketing sofia 5 13.0 2
7 zzz marketing anna 2 2.0 1
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句