i am in python i have a data frame like this contain sub_id refer to patient_id, hour_measure from 1 to 22 and other patient's measurement
subject_id | hour_measure heart rate | urinecolor | blood pressure
--------------------------------------------------------
3 | 1 40 | red | high
3 | 2 60 | red | high
3 | .. .. | .. | ..
3 | 22 90 | red | high
4 | 3 60 | yellow | low
4 | 3 60 | yellow | low
4 | 22 90 | red | high
i want to group sub_id measurement by max min skew,etc for numeric features and first and last value for categorical
i write the follwing code
df= pd.read_csv(path)
df1 = (df.groupby(['subject_id','hour_measure'])
.agg([ 'sum','min','max', 'median','var','skew']))
f = lambda x: next(iter(x.mode()), None)
cols = df.select_dtypes(object).columns
df2 = df.groupby(['subject_id','hour_measure'])[cols].agg(f)
df2.columns = pd.MultiIndex.from_product([df2.columns, ['mode']])
print (df2)
df3 = pd.concat([df1, df2], axis=1).unstack().reorder_levels([0,2,1],axis= 1)
print (df3)
df3.to_csv("newfile.csv")
it give me the grouping for every hour
i try to make it group only with subject id only
df1 = (df.groupby(['subject_id'])
.agg([ 'sum','min','max', 'median','var','skew']))
it also give me the same output , and calculate the statistics for every hour as follows
subject_id | heart rate_1 | heartrate_2 ....
--------------------------------------------------------
| min max mean | min max mean ....
3
4
i want the out put to be as the following
subject_id | heart rate | repiratotry rate |urine color
--------------------------------------------------------
| min | max | mean | min | max | mean ..|. first | last
3 50 60 55 40 65 20 | yellow | red
any one can tell how can i edit the code to give the wanted output any help will appreciated
let me know if this gets you close to what you're looking for. I did not run into your issue with grouping by every hour so I'm not sure if I understood your question completely.
# sample dataframe
df = pd.DataFrame(
{
"subject_id": [1, 1, 1, 2, 2, 2, 3, 3, 3],
"hour_measure": [1, 22, 12, 5, 18, 21, 8, 18, 4],
"blood_pressure": [
"high",
"high",
"high",
"high",
"low",
"low",
"low",
"low",
"high",
],
}
)
# sort out numeric columns before aggregating them
numeric_result = (
df.select_dtypes(include="number")
.groupby(["subject_id"])
.agg(["min", "max", "mean"])
)
# sort out categorical columns before aggregating them
categorical_result = (
df.set_index(["subject_id"])
.select_dtypes(include="object")
.groupby(["subject_id"])
.agg(["first", "last"])
)
# combine numeric and categorical results
result = numeric_result.join(categorical_result)
hour_measure blood_pressure
min max mean first last
subject_id
1 1 22 11.666667 high high
2 5 21 14.666667 high low
3 4 18 10.000000 low high
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments