大熊猫对groupby.sum的归一化

弗朗切斯科·迪·劳罗（Francesco Di Lauro）

我有一个看起来像这样的熊猫数据框：

      **I     SI     weights**
        1     3      0.3  
        2     4      0.2
        1     3      0.5
        1     5      0.5

我需要这样做：给定I值，考虑SI的每个值并加上总重量。最后，对于每个实现，我都应该有这样的东西：

             I = 1     SI = 3      weight = 0.8
                       SI = 5      weight = 0.5

             I = 2     SI = 4      weight = 0.2

这可以通过调用groupby和sum轻松实现：

       name = ['I', 'SI','weight']
       Location = 'Simulationsdata/prova.csv'
       df = pd.read_csv(Location, names = name,sep='\t',encoding='latin1') 

       results = df.groupby(['I', 'real', 'SI']).weight.sum()

现在，我希望将权重归一化，这样权重应该是这样的：

             I = 1     SI = 3      weight = 0.615
                       SI = 5      weight = 0.385

             I = 2     SI = 4      weight = 1

我尝试了这个：

        for idx2, j in enumerate(results.index.get_level_values(1).unique()):
            norm = [float(i)/sum(results.loc[j]) for i in results.loc[j]]

但是当我尝试为每个I绘制SI的分布时，我发现SI也被标准化了，我不希望这种情况发生。

PS这个问题是关系到这一个，但是，因为它涉及到这个问题的另一个方面，我因子评分，这将是最好分头问吧

彼得·莱姆比格勒

您应该可以将weight列除以自己的总和：

# example data
df
   I  SI   weight
0  1   3      0.3
1  2   4      0.2
2  1   3      0.5
3  1   5      0.5

# two-level groupby, with the result as a DataFrame instead of Series:
# df['col'] gives a Series, df[['col']] gives a DF
res = df.groupby(['I', 'SI'])[['weight']].sum()
res
       weight
I SI         
1 3       0.8
  5       0.5
2 4       0.2

# Get the sum of weights for each value of I,
# which will serve as denominators in normalization
denom = res.groupby('I')['weight'].sum()
denom
I
1    1.3
2    0.2
Name: weight, dtype: float64

# Divide each result value by its index-matched
# denominator value
res.weight = res.weight / denom
res
        weight
I SI          
1 3   0.615385
  5   0.384615
2 4   1.000000

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。

编辑于 2020-11-26

我来说两句

0 条评论

登录后参与评论

上一篇：将表单的复选框标签附加到<p>元素中

大熊猫groupby与sum（）在大型csv文件上？

大熊猫对groupby.sum的归一化

大熊猫对groupby.sum的归一化

Linux的官方Adobe Flash存储库是否已过时？

如何使用HttpClient的在使用SSL证书，无论多么“糟糕”是

错误：“ javac”未被识别为内部或外部命令，

Modbus Python施耐德PM5300

为什么Object.hashCode（）不遵循Java代码约定

如何正确比较 scala.xml 节点？

在 Python 2.7 中。如何从文件中读取特定文本并分配给变量

在令牌内联程序集错误之前预期为 ')'

数据表中有多个子行，asp.net核心中来自sql server的数据

VBA 自动化错误：-2147221080 (800401a8)

错误TS2365：运算符'！=='无法应用于类型'“（”'和'“）”'

如何在JavaScript中获取数组的第n个元素？

检查嵌套列表中的长度是否相同

如何将sklearn.naive_bayes与（多个）分类功能一起使用？

ValueError：尝试同时迭代两个列表时，解包的值太多（预期为 2）

ES5的代理替代

在同一Pushwoosh应用程序上Pushwoosh多个捆绑ID

如何监视应用程序而不是单个进程的CPU使用率？

如何检查字符串输入的格式

解决类Koin的实例时出错

如何自动选择正确的键盘布局？-仅具有一个键盘布局