如何在Pandas DataFrame中提取属性名称和最大同时出现次数？

气候

给定df格式的数据框

    A   B   C   D   E   F   G   H   I   J ...
0   0   1   0   0   0   1   0   0   0   0 ...
1   1   1   0   0   1   1   0   0   0   0 ...
2   0   0   1   0   0   0   0   0   0   0 ...
.   .   .   .   .   .   .   .   .   .   .
.   .   .   .   .   .   .   .   .   .   .
.   .   .   .   .   .   .   .   .   .   .

我想最后得到一个格式的结果数据框

   corr  count
A   B     270
B   F      15
C   J     100
.   .       .
.   .       .
.   .       .

其中，每行corr是具有最大同时出现次数的列，并且count是同时出现次数。

我当前的代码如下所示：

cooccurring_df = df.T.dot(df)
np.fill_diagonal(cooccurring_df.values, 0)
idxmax_df = pd.DataFrame(cooccurring_df.idxmax(axis = 0), columns=['corr'])

这使：

   corr 
A   B 
B   F   
C   J  
.   .
.   .
.   .

但是对于我的一生，我无法弄清楚如何将正确分配的计数cooccuring_df计入idxmax_df。我确定我缺少明显的东西，并且我确定有一种更好的方法可以到达想要去的地方。

广晃

IIUC，您正在寻找的是lookup：

idxmax_df['count'] = cooccurring_df.lookup(idxmax_df.index, idxmax_df['corr'])

测试数据：

    A   B   C   D   E   F   G   H   I   J
0   0   1   0   0   0   1   0   0   0   0
1   1   1   0   0   1   1   0   0   0   0
2   0   0   1   0   0   0   0   0   0   1

输出（给定数据）

  corr  count
A    B      1
B    F      2
C    J      1
D    A      0
E    A      1
F    B      2
G    A      0
H    A      0
I    A      0
J    C      1

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。