通过在1.和2.级别内选择值,我遇到了一些问题。
我通过设置
header = [0,1]
In[1]: df = pd.read_csv('Data.txt', sep='\t', header=[0,1], skipinitialspace=True)
In[2]: print(df.columns)
Out[2]: MultiIndex(
levels=[['20052065', '20052066', '20052082', '20052087', '20052089'],
['CTF1', 'CTF2', 'CTF3', 'CTF_M', 'CTM1', 'CTM2', 'CTM3', 'CTM_M']],
labels=[[...]],
names=[...])
如果尝试获取2.级别值的数据和从1.级别中选择的元素,则会得到以下输出:
In[3]: print(df['20052065'][['CTF1','CTF_M']])
Out[3]: TIME[s] CTF1 CTF_M
0.000 -14.386 14.963
60.000 -26.937 34.729
120.000 -29.986 58.265
... ... ...
现在,我尝试为2个元素生成输出,并执行以下操作:
In[4]: print(df[['20052065','20052066']][['CTF1','CTF_M']])
Out[4]: KeyError: "['CTF1' 'CTF_M'] not in index"
不知何故,这行不通。也许您知道发生了什么可怕的事情?
感谢帮助。
编辑: In[1]: print(df)
看起来像:
Out[1]: ELEMENT 20052065 20052066 20052082 20052087 20052089 20052090 \
TIME[s] TEMP[C] CTF1 CTF1 CTF1 CTF1 CTF1 CTF1
0.000 24.000 -4.234 -6.728 -14.386 -4.356 -6.926 -10.205
60.000 36.137 -29.308 -24.795 -26.937 -30.134 -24.735 -23.474
... ... ... ... ... ... ... ...
* .txt文件如下所示:
您可以使用df.loc
:
import numpy as np
import pandas as pd
columns = pd.MultiIndex.from_product([['A','B','C'],['X','Y','Z']])
df = pd.DataFrame(np.random.randint(10, size=(3,len(columns))), columns=columns)
# A B C
# X Y Z X Y Z X Y Z
# 0 2 7 5 1 6 0 5 0 0
# 1 8 4 7 2 0 8 7 3 9
# 2 0 6 8 8 1 1 8 0 2
# In some cases `sort_index` may be needed to avoid UnsortedIndexError
df = df.sort_index(axis=1)
print(df.loc[:, (['A','B'],['X','Y'])])
产量(类似):
A B
X Y X Y
0 2 7 1 6
1 8 4 2 0
2 0 6 8 1
如果只想选择('A','Y')
和('B','X')
列,那么请注意,您可以将MultiIndexed列指定为元组:
In [37]: df.loc[:, [('A','Y'),('B','X')]]
Out[37]:
A B
Y X
0 7 1
1 4 2
2 6 8
or even just df[[('A','Y'),('B','X')]]
(which yields the same result).
And in general it is better to use a single indexer such as df.loc[...]
instead of double indexing (e.g. df[...][...]
). It can be quicker (because it makes fewer calls to __getitem__
, and generates fewer temporary sub-DataFrames) and df.loc[...] = value
it is the correct way to make assignments to sub-slices of a DataFrame which modify df
itself.
The reason why df[['A','B']][['X','Y']]
would not work is because df[['A','B']]
returns a DataFrame with a MultiIndex:
In [36]: df[['A','B']]
Out[36]:
A B
X Y Z X Y Z
0 2 7 5 1 6 0
1 8 4 7 2 0 8
2 0 6 8 8 1 1
So indexing this DataFrame with ['X','Y']
fails because there are no top-level column labels named 'X'
or 'Y'
.
有时,根据DataFrame的构造方式(或由于对DataFrame进行的操作),在对MultiIndex进行切片之前,需要对其进行按顺序排序。在文档中有一个警告框,其中提到了此问题。对列索引进行词法排序
df = df.sort_index(axis=1)
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句