熊猫MultiIndex，按1和2级选择值

尼尔斯·科尔迈

通过在1.和2.级别内选择值，我遇到了一些问题。

我通过设置 header = [0,1]

In[1]:  df = pd.read_csv('Data.txt', sep='\t', header=[0,1], skipinitialspace=True)

In[2]:  print(df.columns)

Out[2]: MultiIndex(
        levels=[['20052065', '20052066', '20052082', '20052087', '20052089'], 
                ['CTF1', 'CTF2', 'CTF3', 'CTF_M', 'CTM1', 'CTM2', 'CTM3', 'CTM_M']],
        labels=[[...]],
        names=[...])

如果尝试获取2.级别值的数据和从1.级别中选择的元素，则会得到以下输出：

In[3]:  print(df['20052065'][['CTF1','CTF_M']])

Out[3]: TIME[s]     CTF1    CTF_M
        0.000    -14.386   14.963
        60.000   -26.937   34.729
        120.000  -29.986   58.265
            ...      ...      ...

现在，我尝试为2个元素生成输出，并执行以下操作：

In[4]:  print(df[['20052065','20052066']][['CTF1','CTF_M']])

Out[4]: KeyError: "['CTF1' 'CTF_M'] not in index"

不知何故，这行不通。也许您知道发生了什么可怕的事情？

感谢帮助。

编辑： In[1]: print(df)看起来像：

Out[1]:          ELEMENT 20052065 20052066 20052082 20052087 20052089 20052090  \
       TIME[s]   TEMP[C]     CTF1     CTF1     CTF1     CTF1     CTF1     CTF1   
       0.000      24.000   -4.234   -6.728  -14.386   -4.356   -6.926  -10.205   
       60.000     36.137  -29.308  -24.795  -26.937  -30.134  -24.735  -23.474 
          ...        ...      ...      ...      ...      ...      ...      ...

* .txt文件如下所示：

算了吧

您可以使用df.loc：

import numpy as np
import pandas as pd

columns = pd.MultiIndex.from_product([['A','B','C'],['X','Y','Z']])
df = pd.DataFrame(np.random.randint(10, size=(3,len(columns))), columns=columns)
#    A        B        C      
#    X  Y  Z  X  Y  Z  X  Y  Z
# 0  2  7  5  1  6  0  5  0  0
# 1  8  4  7  2  0  8  7  3  9
# 2  0  6  8  8  1  1  8  0  2

# In some cases `sort_index` may be needed to avoid UnsortedIndexError
df = df.sort_index(axis=1)
print(df.loc[:, (['A','B'],['X','Y'])])

产量（类似）：

   A     B   
   X  Y  X  Y
0  2  7  1  6
1  8  4  2  0
2  0  6  8  1

如果只想选择('A','Y')和('B','X')列，那么请注意，您可以将MultiIndexed列指定为元组：

In [37]: df.loc[:, [('A','Y'),('B','X')]]
Out[37]: 
   A  B
   Y  X
0  7  1
1  4  2
2  6  8

or even just df[[('A','Y'),('B','X')]] (which yields the same result).

And in general it is better to use a single indexer such as df.loc[...] instead of double indexing (e.g. df[...][...]). It can be quicker (because it makes fewer calls to __getitem__, and generates fewer temporary sub-DataFrames) and df.loc[...] = value it is the correct way to make assignments to sub-slices of a DataFrame which modify df itself.

The reason why df[['A','B']][['X','Y']] would not work is because df[['A','B']] returns a DataFrame with a MultiIndex:

In [36]: df[['A','B']]
Out[36]: 
   A        B      
   X  Y  Z  X  Y  Z
0  2  7  5  1  6  0
1  8  4  7  2  0  8
2  0  6  8  8  1  1

So indexing this DataFrame with ['X','Y'] fails because there are no top-level column labels named 'X' or 'Y'.

有时，根据DataFrame的构造方式（或由于对DataFrame进行的操作），在对MultiIndex进行切片之前，需要对其进行按顺序排序。在文档中有一个警告框，其中提到了此问题。对列索引进行词法排序

df = df.sort_index(axis=1)

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。

编辑于 2020-11-25

我来说两句

0 条评论

登录后参与评论

上一篇：python迭代字典值（如果一个键具有一个或多个值）

TOP 榜单

文章

熊猫MultiIndex，按1和2级选择值

熊猫MultiIndex，按1和2级选择值

Linux的官方Adobe Flash存储库是否已过时？

用日期数据透视表和日期顺序查询

应用发明者仅从列表中选择一个随机项一次

Java Eclipse中的错误13，如何解决？

在Windows 7中无法删除文件（2）

在 Python 2.7 中。如何从文件中读取特定文本并分配给变量

套接字无法检测到断开连接

带有错误“ where”条件的查询如何返回结果？

有什么解决方案可以将android设备用作Cast Receiver？

Mac OS X更新后的GRUB 2问题

ggplot：对齐多个分面图-所有大小不同的分面

验证REST API参数

如何从视图一次更新多行（ASP.NET - Core）

尝试反复更改屏幕上按钮的位置 - kotlin android studio

计算数据帧中每行的NA

检索角度选择div的当前值

离子动态工具栏背景色

UITableView的项目向下滚动后更改颜色，然后快速备份

VB.net将2条特定行导出到DataGridView

蓝屏死机没有修复解决方案

通过 Git 在运行 Jenkins 作业时获取 ClassNotFoundException