calculate average and maximum value for subset of rows in pandas dataframe

user5421875

I have dataframe that looks like:

 date       session     time         x1          x2     x3    x4      x5     x6   
 2015-05-22      1     morning       Tom         129     1     129    45     67
 2015-05-22      1     morning       Kate         0      1     670    89     34   
 2015-05-22      1     noon          GroupeId     0      1     45     56    13
 2015-05-26      2     noon          Hence        129    1     167    7     13
 2015-05-26      2     evening       Kate         0            987    876    478
 2015-05-26      3     night         Julie        0      1     567            8

So I need to calculate the average and maximume value per column for each session, i.e. to have the average of values X2 for each session(first, second or third in example, but in real dataframe I have much more rows and sessions), the maximum of values x4 for X4, the sum of the values x3 for each session. I found a lot of examples for average of several columns, but it's not exactly what I'm looking for, as you see. I tried some methods like: multi_df.groupby(level=1).sum().to_csv('output.csv', sep='\t') for multilevel dataframe that I tried create with this by multi_df=df.set_index(['session','index'], inplace=False) but it doesn't give me the result that could make sens

so any advice or example of transformation like those I'm looking for, is appreciated

hilberts_drinking_problem

Are you looking for something like this? (i.e. a way to aggregate with specific functions per column?).

import pandas as pd
import numpy as np

df = pd.io.parsers.read_csv('temp.txt', sep = '\t')

df_agg = df.groupby('session').agg({
    'x2' : np.mean,
    'x3' : np.sum,
    'x4' : np.min,
    })

# you can apply more than one function to a column like so:

df_agg_multifunc = df.groupby('session').agg({
    'x2' : [np.mean, np.std],
    'x3' : [np.sum, np.std],
    'x4' : [np.min, np.std],
    })

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

calculate average value from pandas dataframe

Pandas Dataframe rows average based on value interval

Pandas average every ith row of dataframe subset based on column value

Add new rows to calculate the sum and average from exiting pandas dataframe

Calculate average and standard deviation per 5 rows in a pandas dataframe

How to calculate moving average for each subsets of rows in pandas dataframe?

Calculate weighted average with pandas dataframe

Efficient way to update column value for subset of rows on Pandas DataFrame?

pandas dataframe: how to aggregate a subset of rows based on value of a column

Average a subset of rows across multiple pandas columns

Pandas: Set values in a column equal to the maximum value of a subset of that column, for each subset in the dataframe

Modifying a subset of rows in a pandas dataframe

Average of rows with ranges in Pandas dataframe

How to calculate average value of different pairs of rows and delete N-1 rows from dataframe?

Drop rows after maximum value in a grouped Pandas dataframe

Calculate weighted average of dataframe rows with missing values

How to calculate an average value across database rows?

Find id of maximum value and average pyspark dataframe

Cumulative sum of a pandas column until a maximum value is met, and average adjacent rows

Calculate weighted average using a pandas/dataframe

Calculate the average consumption from data in Pandas Dataframe

reorder subset of rows in pandas dataframe (reindex)

Modifying multiple columns in a subset of rows in pandas DataFrame

Randomly assign values to subset of rows in pandas dataframe

How to compare subset of rows in pandas dataframe

Sorting only specific subset of rows in pandas dataframe

Setting subset of a pandas DataFrame by a DataFrame if a value matches

calculate average if all the columns have value in Pandas

How calculate an average value of the most recent events across groups in pandas dataframe?