Pandas - 如果在 (col B) 中观察到列 (col A) 中的值，则使用来自 (col C) 的值创建列 (col D)

乔布森

如图所示，我有一个包含 2524 行和列的数据框 A。放在上下文中，这是一项基因组研究，其中状态 1 或 2 分别表示控制或生病。状态值与列 id 相关，例如 sample_1（索引 0）是病态，样本 5（索引 4）是对照。

          fid          iid       father       mother  sex  status
0        fam_7     sample_1            0            0    1       2
1     sample_2     sample_2            0            0    2       2
2     sample_3     sample_3            0            0    1       2
3     sample_4     sample_4            0            0    2       1
4       fam_34     sample_5            0            0    1       1

... ... ... ... ... ... 2519 fam_96 样本_2520 0 样本_1132 1 1 2520 fam_97 样本_2521 样本_760 0 1 2 2521 fam_98 样本_2522 样本_1452 0 2 2 293 样本2 5 2 5 am 293 样本2 1 5 fam_100 样本_2524 样本_2002 0 1 2

请注意，列父亲和母亲包含值 0。在这种情况下，这意味着他们没有父级，而是父级。

我想创建 2 个新列 ['Father status'] 和 ['mother status']。我想查找“父亲”列和“母亲”列中的值是否在列 id 中，然后具有状态。如您所见，在索引 2519 处，我们可以读取列 (mother) 中的 sample_1132。我希望添加该母亲的状态 ['母亲状态']，以确定一个孩子是否需要父母双方都生病。

为了更好地表示，我只为“孩子们”制作了一个单独的数据框：

       fid          iid       father       mother     sex  status
2426   fam_3  sample_2427  sample_1015  sample_1776    1       1
2427   fam_4  sample_2428  sample_1263  sample_1985    2       1
2428   fam_5  sample_2429   sample_517  sample_1508    1       1
2429   fam_6  sample_2430  sample_1753   sample_490    2       1
2430   fam_7  sample_2431     sample_1   sample_312    2       1
2432   fam_9  sample_2433  sample_1845  sample_1627    1       1
2434  fam_11  sample_2435   sample_574  sample_1682    2       1
2435  fam_12  sample_2436   sample_275   sample_947    2       1

2424   fam_1  sample_2425  sample_2397  sample_2351    1       2
2425   fam_2  sample_2426  sample_2063   sample_818    2       2
2431   fam_8  sample_2432   sample_239  sample_1151    2       2
2433  fam_10  sample_2434   sample_171   sample_747    2       2
2440  fam_17  sample_2441  sample_2042  sample_1540    2       2
2441  fam_18  sample_2442  sample_2182   sample_252    2       2
2444  fam_21  sample_2445  sample_1730  sample_1190    2       2
2448  fam_25  sample_2449  sample_1315   sample_762    1       2

我的预期输出将是这样的

       fid          iid       father       mother     sex  status  f_st  m_st 
2434  fam_11  sample_2435   sample_574  sample_1682    2       1     1     2
2435  fam_12  sample_2436   sample_275   sample_947    2       1     1     1  
2424   fam_1  sample_2425  sample_2397  sample_2351    1       2     2     2
2425   fam_2  sample_2426  sample_2063   sample_818    2       2     2     1

泽维尔·布特

我建议mother status通过合并获得该列：

# Get the "status" column of the mother thanks to her index in the "id" column  
df = df.merge(df[["id", "status"]], left_on="mother", right_on="id", how="left", suffixes=('', '_y'))
# Drop the duplicated id column coming from the merge
df.drop("id_y", axis=1, inplace=True)
# Rename the status columns with the desired name
df.rename(columns={"status_y": "mother_status"}, inplace=True)

我让你处理mother_status列中的 NaN 值。

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。

编辑于 2021-08-17

我来说两句

0 条评论

登录后参与评论

上一篇：ffmpeg 命令将任何格式的视频转码为不同的分辨率？

TOP 榜单

文章

Pandas - 如果在 (col B) 中观察到列 (col A) 中的值，则使用来自 (col C) 的值创建列 (col D)

Pandas - 如果在 (col B) 中观察到列 (col A) 中的值，则使用来自 (col C) 的值创建列 (col D)

隐藏发件人没有短信PHP

材质UI垂直滑块。如何改变在垂直材料UI滑块导轨的厚度（反应）

在Windows 7中无法删除文件（2）

HttpClient中的角度变化检测

Azure VM启动/停止日志

如何在 Vb.net 中使用函数返回多个值

Powerpoint-条形长度错误的堆积条形图

最新歌剧断断续续的快速拨号和渲染错误

Mac OS X更新后的GRUB 2问题

需要公式以vlookup逗号分隔单个单元格中的值

Hashchange事件侦听器在将事件处理程序附加到事件之前进行侦听

ggplot：对齐多个分面图-所有大小不同的分面

OS X-为什么我需要打开WiFi才能确定最近的位置

用日期数据透视表和日期顺序查询

Java Eclipse中的错误13，如何解决？

如何在Django中使用UUID

加载Microsoft Visual菜单时出现问题

具有if条件的SQL UPDATE

从JSON到JSONL的Python转换

如何在Kod中更改字体？

共享图像将路径放入地址