stack data based on column

mayaaa Published at Dev

mayaaa

i am in python i have a data frame like this contain sub_id refer to patient_id, hour_measure from 1 to 22 and other patient's measurement

  subject_id  |   hour_measure     heart rate     |  urinecolor |  blood pressure  
     --------------------------------------------------------                
        3        |  1                   40        |  red        |  high
        3        |  2                   60        |  red        |  high
        3        |  ..                  ..        |  ..         |  ..
        3        |  22                  90        |  red        |  high

        4        |  3                   60        |  yellow     |  low
        4        |  3                   60        |  yellow     |  low  
        4        |  22                  90        |  red        |  high

i want to group sub_id measurement by max min skew,etc for numeric features and first and last value for categorical

i write the follwing code

df= pd.read_csv(path)
df1 = (df.groupby(['subject_id','hour_measure'])
        .agg([ 'sum','min','max', 'median','var','skew']))
f = lambda x: next(iter(x.mode()), None)
cols = df.select_dtypes(object).columns
df2 = df.groupby(['subject_id','hour_measure'])[cols].agg(f)
df2.columns = pd.MultiIndex.from_product([df2.columns, ['mode']])
print (df2) 
df3 = pd.concat([df1, df2], axis=1).unstack().reorder_levels([0,2,1],axis= 1)
print (df3)          
df3.to_csv("newfile.csv")

it give me the grouping for every hour

i try to make it group only with subject id only

df1 = (df.groupby(['subject_id'])
        .agg([ 'sum','min','max', 'median','var','skew']))

it also give me the same output , and calculate the statistics for every hour as follows

     subject_id  |     heart rate_1     |  heartrate_2 .... 
     --------------------------------------------------------                
                |  min    max     mean  | min   max   mean ....               
        3
        4

i want the out put to be as the following

     subject_id  |     heart rate        |  repiratotry rate  |urine color
     --------------------------------------------------------                
                 |  min  |  max   | mean  | min |  max |  mean ..|. first |  last 
        3            50     60      55     40     65      20     | yellow |  red

any one can tell how can i edit the code to give the wanted output any help will appreciated

eva-vw

let me know if this gets you close to what you're looking for. I did not run into your issue with grouping by every hour so I'm not sure if I understood your question completely.

# sample dataframe
df = pd.DataFrame(
    {
        "subject_id": [1, 1, 1, 2, 2, 2, 3, 3, 3],
        "hour_measure": [1, 22, 12, 5, 18, 21, 8, 18, 4],
        "blood_pressure": [
            "high",
            "high",
            "high",
            "high",
            "low",
            "low",
            "low",
            "low",
            "high",
        ],
    }
)
# sort out numeric columns before aggregating them
numeric_result = (
    df.select_dtypes(include="number")
    .groupby(["subject_id"])
    .agg(["min", "max", "mean"])
)

# sort out categorical columns before aggregating them
categorical_result = (
    df.set_index(["subject_id"])
    .select_dtypes(include="object")
    .groupby(["subject_id"])
    .agg(["first", "last"])
)

# combine numeric and categorical results
result = numeric_result.join(categorical_result)

                 hour_measure                blood_pressure
                    min max       mean          first  last
subject_id
1                     1  22  11.666667           high  high
2                     5  21  14.666667           high   low
3                     4  18  10.000000            low  high

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at2021-05-23

Comments

0 comments

stack data based on column

stack data based on column

pump.io port in URL

Loopback Error: connect ECONNREFUSED 127.0.0.1:3306 (MAMP)

Can't pre-populate phone number and message body in SMS link on iPhones when SMS app is not running in the background

How to import an asset in swift using Bundle.main.path() in a react-native native module

Failed to listen on localhost:8000 (reason: Cannot assign requested address)

Spring Boot JPA PostgreSQL Web App - Internal Authentication Error

ngClass error (Can't bind ngClass since it isn't a known property of div) in Angular 11.0.3

Using Response.Redirect with Friendly URLS in ASP.NET

Can a 32-bit antivirus program protect you from 64-bit threats

Double spacing in rmarkdown pdf

How to fix "pickle_module.load(f, **pickle_load_args) _pickle.UnpicklingError: invalid load key, '<'" using YOLOv3?

3D Touch Peek Swipe Like Mail

Bootstrap 5 Static Modal Still Closes when I Click Outside

Assembly definition can't resolve namespaces from external packages

Vector input in shiny R and then use it

Emulator wrong screen resolution in Android Studio 1.3

Svchost high CPU from Microsoft.BingWeather app errors

Graphics Context misaligned on first paint

Python connect to firebird docker database

Is this docker-for-mac password dialog legit?

How to save models trained locally in Amazon SageMaker?