stack data based on column

mayaaa

i am in python i have a data frame like this contain sub_id refer to patient_id, hour_measure from 1 to 22 and other patient's measurement

  subject_id  |   hour_measure     heart rate     |  urinecolor |  blood pressure  
     --------------------------------------------------------                
        3        |  1                   40        |  red        |  high
        3        |  2                   60        |  red        |  high
        3        |  ..                  ..        |  ..         |  ..
        3        |  22                  90        |  red        |  high

        4        |  3                   60        |  yellow     |  low
        4        |  3                   60        |  yellow     |  low  
        4        |  22                  90        |  red        |  high

i want to group sub_id measurement by max min skew,etc for numeric features and first and last value for categorical

i write the follwing code

df= pd.read_csv(path)
df1 = (df.groupby(['subject_id','hour_measure'])
        .agg([ 'sum','min','max', 'median','var','skew']))
f = lambda x: next(iter(x.mode()), None)
cols = df.select_dtypes(object).columns
df2 = df.groupby(['subject_id','hour_measure'])[cols].agg(f)
df2.columns = pd.MultiIndex.from_product([df2.columns, ['mode']])
print (df2) 
df3 = pd.concat([df1, df2], axis=1).unstack().reorder_levels([0,2,1],axis= 1)
print (df3)          
df3.to_csv("newfile.csv")

it give me the grouping for every hour

i try to make it group only with subject id only

df1 = (df.groupby(['subject_id'])
        .agg([ 'sum','min','max', 'median','var','skew']))

it also give me the same output , and calculate the statistics for every hour as follows

     subject_id  |     heart rate_1     |  heartrate_2 .... 
     --------------------------------------------------------                
                |  min    max     mean  | min   max   mean ....               
        3
        4

i want the out put to be as the following

     subject_id  |     heart rate        |  repiratotry rate  |urine color
     --------------------------------------------------------                
                 |  min  |  max   | mean  | min |  max |  mean ..|. first |  last 
        3            50     60      55     40     65      20     | yellow |  red

any one can tell how can i edit the code to give the wanted output any help will appreciated

eva-vw

let me know if this gets you close to what you're looking for. I did not run into your issue with grouping by every hour so I'm not sure if I understood your question completely.

# sample dataframe
df = pd.DataFrame(
    {
        "subject_id": [1, 1, 1, 2, 2, 2, 3, 3, 3],
        "hour_measure": [1, 22, 12, 5, 18, 21, 8, 18, 4],
        "blood_pressure": [
            "high",
            "high",
            "high",
            "high",
            "low",
            "low",
            "low",
            "low",
            "high",
        ],
    }
)
# sort out numeric columns before aggregating them
numeric_result = (
    df.select_dtypes(include="number")
    .groupby(["subject_id"])
    .agg(["min", "max", "mean"])
)

# sort out categorical columns before aggregating them
categorical_result = (
    df.set_index(["subject_id"])
    .select_dtypes(include="object")
    .groupby(["subject_id"])
    .agg(["first", "last"])
)

# combine numeric and categorical results
result = numeric_result.join(categorical_result)
                 hour_measure                blood_pressure
                    min max       mean          first  last
subject_id
1                     1  22  11.666667           high  high
2                     5  21  14.666667           high   low
3                     4  18  10.000000            low  high

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

Stack data based on substring in column name

Stack Based Column Sum in a data frame using R

stack rows as columns based on another rank column

Partitioning based on column data?

Running total based on Data column

Sorting data based on column date

Grouping the data into categories based on a column

select data based on a date column

filtering data in dataframe based on a column

Sorting Data Based On the Last column

How to add a column with data based on another column

Modify column data based on other column in pandas

Create new column based on data in existing column

Split data in column based on value of in another column

Adding column data based on column values?

How to position a Column inside the stack based on the fractional height of screen in flutter?

Python: Stack DataFrame based on similar string in column name

Adding column based on data in other data frame

add new column to data based on other data

Generate data with python based on column data

How to view the column data based on another column data in the same table?

Filling missing column data based on other column data in R

Segfault and misbehaving data member in array-based integer stack

How to stack 2 columns of data with spaces into a single column with no spaces

Aggregate data frame based on column value in R

rbind data based on matching values in a column

How to Pivot the data Based on another column value

Adding a column to a data frame based on certain criteria?

Adding identity values to existing column based on data