I have dataframe that looks like:
date session time x1 x2 x3 x4 x5 x6
2015-05-22 1 morning Tom 129 1 129 45 67
2015-05-22 1 morning Kate 0 1 670 89 34
2015-05-22 1 noon GroupeId 0 1 45 56 13
2015-05-26 2 noon Hence 129 1 167 7 13
2015-05-26 2 evening Kate 0 987 876 478
2015-05-26 3 night Julie 0 1 567 8
So I need to calculate the average and maximume value per column for each session, i.e. to have the average of values X2 for each session(first, second or third in example, but in real dataframe I have much more rows and sessions), the maximum of values x4 for X4, the sum of the values x3 for each session. I found a lot of examples for average of several columns, but it's not exactly what I'm looking for, as you see. I tried some methods like: multi_df.groupby(level=1).sum().to_csv('output.csv', sep='\t')
for multilevel dataframe that I tried create with this by multi_df=df.set_index(['session','index'], inplace=False)
but it doesn't give me the result that could make sens
so any advice or example of transformation like those I'm looking for, is appreciated
Are you looking for something like this? (i.e. a way to aggregate with specific functions per column?).
import pandas as pd
import numpy as np
df = pd.io.parsers.read_csv('temp.txt', sep = '\t')
df_agg = df.groupby('session').agg({
'x2' : np.mean,
'x3' : np.sum,
'x4' : np.min,
})
# you can apply more than one function to a column like so:
df_agg_multifunc = df.groupby('session').agg({
'x2' : [np.mean, np.std],
'x3' : [np.sum, np.std],
'x4' : [np.min, np.std],
})
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments