我想知道如何像下面的问题那样获得熊猫数据框的频率计数:
import numpy as np
import pandas as pd
df = pd.DataFrame({'A': [1,1,2,3,5,2],
'B': [10,10,10,300,400,500],
'C': ['p','p','q','q','q','q']})
print(df)
A B C
0 1 10 p
1 1 10 p
2 2 10 q
3 3 300 q
4 5 400 q
5 2 500 q
A B C
(1,2) (10,3) ('p', 2)
(2,2) (300,1) ('q', 4)
(3,1) (400,1)
(5,1) (500,1)
您可以Counter
为每一列构造一个对象列表,并重建数据框:
from collections import Counter
c = [Counter(j for j in i).items() for i in df.values.T]
pd.DataFrame.from_records(c, index=df.columns).T
A B C
0 (1, 2) (10, 3) (p, 2)
1 (2, 2) (300, 1) (q, 4)
2 (3, 1) (400, 1) None
3 (5, 1) (500, 1) None
为了对计数进行排序:
from operator import itemgetter
c = [sorted(
Counter(j for j in i).items(),
key=itemgetter(1),
reverse=True)
for i in df.values.T]
pd.DataFrame.from_records(c, index=df.columns).T
A B C
0 (1, 2) (10, 3) (q, 4)
1 (2, 2) (300, 1) (p, 2)
2 (3, 1) (400, 1) None
3 (5, 1) (500, 1) None
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句