我有一个长格式的数据框,其中包含每个主题的多个样本和时间点。样本数量和时间点可能会有所不同,时间点之间的天数也可能会有所不同:
test_df = pd.DataFrame({"subject_id":[1,1,1,2,2,3],
"sample":["A", "B", "C", "D", "E", "F"],
"timepoint":[19,11,8,6,2,12],
"time_order":[3,2,1,2,1,1]
})
subject_id sample timepoint time_order
0 1 A 19 3
1 1 B 11 2
2 1 C 8 1
3 2 D 6 2
4 2 E 2 1
5 3 F 12 1
我需要找出一种方法,以便按subject_id对该数据帧进行分组,并将所有样本和时间点按时间顺序放在同一行上。
期望的输出:
subject_id sample1 timepoint1 sample2 timepoint2 sample3 timepoint3
0 1 C 8 B 11 A 19
1 2 E 2 D 6 null null
5 3 F 12 null null null null
Pivot使我离我很近,但是我对如何从那里继续感到困惑:
test_df = test_df.pivot(index=['subject_id', 'sample'],
columns='time_order', values='timepoint')
用于DataFrame.set_index
和DataFrame.unstack
一起旋转,按列对MultiIndex排序,将其展平并最后转换subject_id
为列:
df = (test_df.set_index(['subject_id', 'time_order'])
.unstack()
.sort_index(level=[1,0], axis=1))
df.columns = df.columns.map(lambda x: f'{x[0]}{x[1]}')
df = df.reset_index()
print (df)
subject_id sample1 timepoint1 sample2 timepoint2 sample3 timepoint3
0 1 C 8.0 B 11.0 A 19.0
1 2 E 2.0 D 6.0 NaN NaN
2 3 F 12.0 NaN NaN NaN NaN
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句