if-else按列dtype在熊猫中

KevOMalley743

格式化熊猫的输出

我正在尝试以一种可以自动使用的格式自动从大熊猫中获取输出,并且尽量避免在文字处理器中造成混乱。我将描述性统计信息作为练习案例,因此尝试使用的输出df[variable].describe()我的问题是,.describe()根据dtype列的内容,响应会有所不同(如果我正确理解的话)。

对于数字列,将describe()产生以下输出:

count    306.000000
mean      36.823529
std        6.308587
min       10.000000
25%       33.000000
50%       37.000000
75%       41.000000
max       50.000000
Name: gses_tot, dtype: float64

但是,对于分类列,它将产生:

count        306
unique         3
top       Female
freq         166
Name: gender, dtype: object

由于存在这种差异,我需要使用不同的代码来捕获所需的信息,但是,我似乎无法使我的代码在分类变量上工作。

我尝试过的

我尝试了几种不同的版本:

for v in df.columns:
    if df[v].dtype.name == 'category': #i've also tried 'object' here
        c, u, t, f, = df[v].describe()
        print(f'******{str(v)}******')
        print(f'Largest category = {t}')
        print(f'Percentage = {(f/c)*100}%')        
    else:
        c, m, std, mi, tf, f, sf, ma, = df[v].describe()
        print(f'******{str(v)}******')
        print(f'M = {m}')
        print(f'SD = {std}')
        print(f'Range = {float(ma) - float(mi)}')
        print(f'\n')

else块中的代码工作正常,但是当我进入分类列时,出现以下错误

******age****** #this is the output I want to a numberical column
M = 34.21568627450981
SD = 11.983015946197659
Range = 53.0


---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-24-f077cc105185> in <module>
      6         print(f'Percentage = {(f/c)*100}')
      7     else:
----> 8         c, m, std, mi, tf, f, sf, ma, = df[v].describe()
      9         print(f'******{str(v)}******')
     10         print(f'M = {m}')

ValueError: not enough values to unpack (expected 8, got 4)

我想发生的是

******age****** #this is the output I want to a numberical column
M = 34.21568627450981
SD = 11.983015946197659
Range = 53.0


******gender******
Largest category = female
Percentage = 52.2%


I believe that the issue is how I'm setting up the if statement with the dtype
and I've rooted around to try to find out how to access the dtype properly but I can't seem to make it work. 

Advice would be much appreciated.
斯蒂夫

您可以检查describe输出中包含哪些字段并打印相应的部分:

import pandas as pd

df = pd.DataFrame({'categorical': pd.Categorical(['d','e','f']), 'numeric': [1, 2, 3], 'object': ['a', 'b', 'c']})

for v in df.columns:
    desc = df[v].describe()
    print(f'******{str(v)}******')
    if 'top' in desc:
        print(f'Largest category = {desc["top"]}')
        print(f'Percentage = {(desc["freq"]/desc["count"])*100:.1f}%')        
    else:
        print(f'M = {desc["mean"]}')
        print(f'SD = {desc["std"]}')
        print(f'Range = {float(desc["max"]) - float(desc["min"])}')

本文收集自互联网,转载请注明来源。

如有侵权,请联系 [email protected] 删除。

编辑于
0

我来说两句

0 条评论
登录 后参与评论

相关文章