在计算其他列的平均值时按“日期”分组

戴维德·坦布里诺

我有一个包含 3 列的数据框：ID、日期、Data_Value 报告温度记录（Data_Value）在给定时间段（日期 - 每天）中来自不同气象站（ID）。我需要的是每天“分组”并计算每天的平均温度，例如

ID      |   Date       | Data_Value
------------------------------------
12345   |   02-05-2017 |  22
12346   |   02-05-2017 |  24
12347   |   02-05-2017 |  20
12348   |   01-05-2017 |  18
12349   |   01-05-2017 |  16

变成：

ID      |   Date       | Data_Value
------------------------------------
.....   |   02-05-2017 | 22
.....   |   01-05-2017 | 17

有人可以帮我解决这个问题吗？

耶斯列

我认为你需要groupby和聚合mean：

df = df.groupby('Date', as_index=False, sort=False)['Data_Value'].mean()
print (df)
         Date  Data_Value
0  02-05-2017          22
1  01-05-2017          17

然后如果需要也ID使用值agg：

df = df.groupby('Date', as_index=False, sort=False)
       .agg({'Data_Value':'mean', 'ID':lambda x: ','.join(x.astype(str))})
       .reindex_axis(['ID','Date','Data_Value'], axis=1)
print (df)
                  ID        Date  Data_Value
0  12345,12346,12347  02-05-2017          22
1        12348,12349  01-05-2017          17

或者，如果只有ID聚合的第一个值first：

df = df.groupby('Date', as_index=False, sort=False) 
       .agg({'Data_Value':'mean', 'ID':'first'}) 
       .reindex_axis(['ID','Date','Data_Value'], axis=1)
print (df)

      ID        Date  Data_Value
0  12345  02-05-2017          22
1  12348  01-05-2017          17

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。