所以我有以下(示例)数据框:
In [1]:
import numpy as np
import panda as pd
df = pd.DataFrame([[a,a,a,a,a,a,b,b,c,d],[Ankle Circles, Ankle Pumps, Static Glutes, Static Quads, Static Quads,Breathing Exercises, Heel Slides, Standing Hip, Ankle Circles, Ankle Pumps], [0,10,0,0,0,10,20,30,10,0]], columns = ['ID', 'exercise_title', 'exercise_duration'])
In [2]: df
Out[2]:
ID exercise_title exercise_duration
a Ankle Circles 0.0
a Ankle Pumps 10.0
a Static Glutes 0.0
a Static Quads 0.0
a Static Quads 0.0
a Breathing Exercise 10.0
b Heel Slides 20.0
b Standing Hip 30.0
c Ankle Circles 10.0
d Ankle Pumps 0.0
上方是数据集的简化版本。有90个不同的练习标题,我希望创建一个新的数据框,将其ID分组在一起,每个练习标题都有2列:
1-每次锻炼所花费的时间总和2-一次对患者是否进行该锻炼的回答为是/否。
所以我希望它看起来像这样,但是要更大,因为实际上有90种不同的练习标题:
In [3]:
Out[3]:
ID Ankle_Circles_duration Ankle_Circles Ankle_Pumps_duration Ankle_Pumps Static_Glutes_duration Static_Glutes Static_quads_duration Static_quads Breathing_Exercises_duration Breathing_Exercises Heel_Slides_duration Heel_Slides Standing_Hip_duration Standing_Hip
a 0.0 No 10.0 Yes 0.0 No 0.0 No 0.0 No 0.0 No 0.0 No
b 0.0 No 0.0 No 0.0 No 0.0 No 10.0 Yes 20.0 Yes 0.0 No
c 10.0 Yes 0.0 No 0.0 No 0.0 No 0.0 No 0.0 No 0.0 No
d 0.0 No 0.0 No 0.0 No 0.0 No 0.0 No 0.0 No 0.0 No
我尝试了以下编码,但这仅适用于前两列,我无法对所有90种运动标题类型进行编码,因为这将花费很长时间,所以我想知道是否有一种更高效,更快捷的方法去做这个?
ankle_circles_duration = df[df['exercise_title'] == 'Ankle circles'].groupby('ID').sum()['exercise_duration']
exercise_new['ankle_circles_duration'] = exercise_new['ankle_circles_duration'].fillna(0)
exercise_new.loc[exercise_new['ankle_circles_duration'] >0, 'ankle_circles'] = 'Yes'
exercise_new.loc[exercise_new['ankle_circles_duration'] == 0, 'ankle_circles'] = 'No'
谢谢。
您可以使用尝试类似的操作pivot
,然后np.where
:
df=df.drop_duplicates()
df=df.pivot(index='ID', columns='exercise_title', values='exercise_duration').fillna(0)
newdf=pd.DataFrame(index=df.index)
for col in df.columns:
newdf[col+'_duration']=df[col]
newdf[col]=np.where(df[col].eq(0),'No','Yes')
print(newdf)
输出:
df with pivot:
exercise_title Ankle Circles Ankle Pumps Breathing Exercise Heel Slides Standing Hip Static Glutes Static Quads
ID
a 0.0 10.0 10.0 0.0 0.0 0.0 0.0
b 0.0 0.0 0.0 20.0 30.0 0.0 0.0
c 10.0 0.0 0.0 0.0 0.0 0.0 0.0
d 0.0 0.0 0.0 0.0 0.0 0.0 0.0
newdf:
Ankle_Circles_duration Ankle_Circles Ankle_Pumps_duration Ankle_Pumps ... Static_Glutes_duration Static_Glutes Static_Quads_duration Static_Quads
ID ...
a 0.0 No 10.0 Yes ... 0.0 No 0.0 No
b 0.0 No 0.0 No ... 0.0 No 0.0 No
c 10.0 Yes 0.0 No ... 0.0 No 0.0 No
d 0.0 No 0.0 No ... 0.0 No 0.0 No
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句