同时按索引和名称提取列

Boyang Li
FEATURES = ['col_0', 'col_1', 'col_2', 'col_3', 'col_4', 'col_5', 'col_6', 'col_7']
DATA_TYPE = [True, True, False, True, False, False, True, True, False, True]

这是我的面具示例。

train_data.iloc[:, DATA_TYPE].loc[:, FEATURES]

我首先获取 DATA_TYPE[col_number] 设置为 true 的所有列,然后获取 col_name 在 FEATURES 中的所有列

但是后来我收到了一些警告,结果包含 Null 列

FutureWarning: 
Passing list-likes to .loc or [] with any missing label will raise
KeyError in the future, you can use .reindex() as an alternative.

    col_0   col_1   col_2   col_3   col_4   col_5   col_6   col_7
0   0.791166    0.009661    NaN 0.148213    NaN NaN 0.573262    0.875242
1   0.131313    0.211741    NaN 0.701692    NaN NaN 0.981332    0.854273
2   0.382859    0.489186    NaN 0.461275    NaN NaN 0.290135    0.421597
3   0.871551    0.585270    NaN 0.135620    NaN NaN 0.894486    0.977827
4   0.524309    0.935508    NaN 0.108710    NaN NaN 0.947512    0.226602

执行此操作的正确方法是什么?谢谢!

编辑:DataFrame 应该首先被 DATA_TYPE 屏蔽,然后只选择 FEATURES 中具有名称的列。

耶斯列

首先DATA_TYPE通过索引过滤列,然后通过intersection以下方式获取所有过滤的列

np.random.seed(456)

train_data = pd.DataFrame(np.random.rand(5, 10)).add_prefix('col_')
print (train_data)
      col_0     col_1     col_2     col_3     col_4     col_5     col_6  \
0  0.248756  0.163067  0.783643  0.808523  0.625628  0.604114  0.885702   
1  0.435679  0.385273  0.575710  0.146091  0.686593  0.468804  0.569999   
2  0.180917  0.118158  0.242734  0.008183  0.360068  0.146042  0.542723   
3  0.213594  0.973156  0.858330  0.533785  0.434459  0.187193  0.288276   
4  0.556988  0.942390  0.153546  0.896226  0.178035  0.594263  0.042630   

      col_7     col_8     col_9  
0  0.759117  0.181105  0.150169  
1  0.645701  0.723341  0.680671  
2  0.857103  0.200212  0.134633  
3  0.627167  0.355706  0.729455  
4  0.653391  0.366720  0.795570  

FEATURES = ['col_0', 'col_1', 'col_2', 'col_3', 'col_4', 'col_5', 'col_6', 'col_7']
DATA_TYPE = [True, True, False, True, False, False, True, True, False, True]

cols = train_data.columns[DATA_TYPE].intersection(FEATURES)
print (cols)
Index(['col_0', 'col_1', 'col_3', 'col_6', 'col_7'], dtype='object')

df = train_data[cols]
print (df)
      col_0     col_1     col_3     col_6     col_7
0  0.248756  0.163067  0.808523  0.885702  0.759117
1  0.435679  0.385273  0.146091  0.569999  0.645701
2  0.180917  0.118158  0.008183  0.542723  0.857103
3  0.213594  0.973156  0.533785  0.288276  0.627167
4  0.556988  0.942390  0.896226  0.042630  0.653391

本文收集自互联网,转载请注明来源。

如有侵权,请联系 [email protected] 删除。

编辑于
0

我来说两句

0 条评论
登录 后参与评论

相关文章