目标:为df的每一列和每个客户获取缺失值的百分比
我的df关于创建的票证:
id type ... priority Client
0 56 113 Incident ... Low client1
1 56 267 Demande ... High client1
2 56 294 Incident ... Nan NaN
3 56 197 Demande ... Low client3
4 56 143 Demande ... Nan client4
第一次尝试 :
df.notna().sum()/len(agg_global)*100
Out[29]:
id 97.053453
type 76.415869
priority 82.626625
client 84.596443
这是非常有用的,但是我想在列中的“客户”维度向输出中添加更多详细信息,如下所示:
我想创建的输出:
Client1 Client2 Client3 NaN
id 100.000000 100.000000 100.000000 66.990424
type 76.415869 66.990424 76.415869 43.761970
status 100.000000 100.000000 66.990424 76.415869
category 66.990424 43.761970 76.415869 43.761970
entity 43.761970 100.000000 76.415869 76.415869
source_demande 84.596443 100.000000 76.415869 43.761970
我尝试使用“ groupby”,但我无法获得所需的输出...:
id type ... priority Client
client ...
True 97.053453 76.415869 ... 29.98632 29.98632
任何建议将被认真考虑。感谢您的关注 !
您可以删除Client
不测试缺失值百分比的列,通过来测试它们DataFrame.isna
,通过Client
用replace NaN
s 汇总平均值以避免丢失它们,最后通过DataFrame.T
:
print (df)
id type priority Client
0 NaN Incident Low client1
1 NaN NaN High client1
2 56 294 Incident Nan NaN
3 56 197 NaN Low client3
4 NaN Demande NaN client4
df = (df.drop('Client', 1)
.isna()
.groupby(df['Client'].fillna('NaN'))
.mean()
.rename_axis(None)
.T)
print (df)
NaN client1 client3 client4
id 0.0 1.0 0.0 1.0
type 0.0 0.5 1.0 0.0
priority 0.0 0.0 0.0 1.0
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句