熊猫groupby并使用有序列扩大数据框

SummerEla

我有一个长格式的数据框，其中包含每个主题的多个样本和时间点。样本数量和时间点可能会有所不同，时间点之间的天数也可能会有所不同：

test_df = pd.DataFrame({"subject_id":[1,1,1,2,2,3],
                    "sample":["A", "B", "C", "D", "E", "F"],
                    "timepoint":[19,11,8,6,2,12],
                    "time_order":[3,2,1,2,1,1]
 })

   subject_id   sample  timepoint   time_order
0    1            A        19           3
1    1            B        11           2
2    1            C         8           1
3    2            D         6           2
4    2            E         2           1
5    3            F        12           1

我需要找出一种方法，以便按subject_id对该数据帧进行分组，并将所有样本和时间点按时间顺序放在同一行上。

期望的输出：

    subject_id  sample1 timepoint1  sample2   timepoint2  sample3 timepoint3
0    1            C         8         B        11        A      19                              
1    2            E         2         D         6       null   null         
5    3            F        12        null      null     null   null

Pivot使我离我很近，但是我对如何从那里继续感到困惑：

test_df = test_df.pivot(index=['subject_id', 'sample'],
columns='time_order', values='timepoint')

耶斯列尔

用于DataFrame.set_index和DataFrame.unstack一起旋转，按列对MultiIndex排序，将其展平并最后转换subject_id为列：

df = (test_df.set_index(['subject_id', 'time_order'])
             .unstack()
             .sort_index(level=[1,0], axis=1))
df.columns = df.columns.map(lambda x: f'{x[0]}{x[1]}')
df = df.reset_index()
print (df)
   subject_id sample1  timepoint1 sample2  timepoint2 sample3  timepoint3
0           1       C         8.0       B        11.0       A        19.0
1           2       E         2.0       D         6.0     NaN         NaN
2           3       F        12.0     NaN         NaN     NaN         NaN

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。