熊猫按计数将列分成矩阵

泰勒·NG

我在df中有此专栏：

Column A
--------
x-y: 1
x-y: 2
x-y: 2
x-x: 1
y-x: 2
y-y: 3
y-y: 3

是否可以将它们分解成这样的矩阵？

     1     2     3      *based on the range of number of column A
     --------------
x-x  1     0     0      because there's 1 'x-x: 1'
x-y  1     2     0      because there's 1 'x-y: 1' and 2 'x-y: 2'
y-x  0     1     0      because there's 1 'x-y: 2'
y-y  0     0     2      because there's 2 'y-y: 3'

谢谢！

耶斯列尔

您可以使用reset_indexwith groupby，然后通过进行计数size并通过进行整形unstack：

print (df)
     Column A
x-y         1
x-y         2
x-y         2
x-x         1
y-x         2
y-y         3
y-y         3

print (df.reset_index())
  index  Column A
0   x-y         1
1   x-y         2
2   x-y         2
3   x-x         1
4   y-x         2
5   y-y         3
6   y-y         3

df = df.reset_index().groupby(['index','Column A']).size().unstack(fill_value=0)
print (df)
Column A  1  2  3
index            
x-x       1  0  0
x-y       1  2  0
y-x       0  1  0
y-y       0  0  2

另一个解决方案crosstab：

df = pd.crosstab(df.index, df['Column A'])
print (df)
Column A  1  2  3
row_0            
x-x       1  0  0
x-y       1  2  0
y-x       0  1  0
y-y       0  0  2

如果需要拆分：

print (df)
  Column A
0   x-y: 1
1   x-y: 2
2   x-y: 2
3   x-x: 1
4   y-x: 2
5   y-y: 3
6   y-y: 3

df[['a','b']] = df['Column A'].str.split(':\s+', expand=True)
print (df)

  Column A    a  b
0   x-y: 1  x-y  1
1   x-y: 2  x-y  2
2   x-y: 2  x-y  2
3   x-x: 1  x-x  1
4   y-x: 2  y-x  2
5   y-y: 3  y-y  3
6   y-y: 3  y-y  3

df = df.groupby(['a','b']).size().unstack(fill_value=0)
print (df)
b    1  2  3
a           
x-x  1  0  0
x-y  1  2  0
y-x  0  1  0
y-y  0  0  2

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。