我正在尝试以一种可以自动使用的格式自动从大熊猫中获取输出,并且尽量避免在文字处理器中造成混乱。我将描述性统计信息作为练习案例,因此尝试使用的输出df[variable].describe()
。我的问题是,.describe()
根据dtype
列的内容,响应会有所不同(如果我正确理解的话)。
对于数字列,将describe()
产生以下输出:
count 306.000000
mean 36.823529
std 6.308587
min 10.000000
25% 33.000000
50% 37.000000
75% 41.000000
max 50.000000
Name: gses_tot, dtype: float64
但是,对于分类列,它将产生:
count 306
unique 3
top Female
freq 166
Name: gender, dtype: object
由于存在这种差异,我需要使用不同的代码来捕获所需的信息,但是,我似乎无法使我的代码在分类变量上工作。
我尝试了几种不同的版本:
for v in df.columns:
if df[v].dtype.name == 'category': #i've also tried 'object' here
c, u, t, f, = df[v].describe()
print(f'******{str(v)}******')
print(f'Largest category = {t}')
print(f'Percentage = {(f/c)*100}%')
else:
c, m, std, mi, tf, f, sf, ma, = df[v].describe()
print(f'******{str(v)}******')
print(f'M = {m}')
print(f'SD = {std}')
print(f'Range = {float(ma) - float(mi)}')
print(f'\n')
else
块中的代码工作正常,但是当我进入分类列时,出现以下错误
******age****** #this is the output I want to a numberical column
M = 34.21568627450981
SD = 11.983015946197659
Range = 53.0
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-24-f077cc105185> in <module>
6 print(f'Percentage = {(f/c)*100}')
7 else:
----> 8 c, m, std, mi, tf, f, sf, ma, = df[v].describe()
9 print(f'******{str(v)}******')
10 print(f'M = {m}')
ValueError: not enough values to unpack (expected 8, got 4)
我想发生的是
******age****** #this is the output I want to a numberical column
M = 34.21568627450981
SD = 11.983015946197659
Range = 53.0
******gender******
Largest category = female
Percentage = 52.2%
I believe that the issue is how I'm setting up the if statement with the dtype
and I've rooted around to try to find out how to access the dtype properly but I can't seem to make it work.
Advice would be much appreciated.
您可以检查describe输出中包含哪些字段并打印相应的部分:
import pandas as pd
df = pd.DataFrame({'categorical': pd.Categorical(['d','e','f']), 'numeric': [1, 2, 3], 'object': ['a', 'b', 'c']})
for v in df.columns:
desc = df[v].describe()
print(f'******{str(v)}******')
if 'top' in desc:
print(f'Largest category = {desc["top"]}')
print(f'Percentage = {(desc["freq"]/desc["count"])*100:.1f}%')
else:
print(f'M = {desc["mean"]}')
print(f'SD = {desc["std"]}')
print(f'Range = {float(desc["max"]) - float(desc["min"])}')
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句