Pandas Dataframe - Group by column value and lookup values from other columns

Crovish

I have this dataframe :

label | time_0 | value_0 | time_1 | value_1 | time_2 | value_2

value_0 column corresponds to time_0, value_1 to time_1...

time_i is unique, which means for all rows time_0 has the same value, same for time_1, time_2..

I want to achieve 2 things :

1- first group_by label, then time_i

The result should look like :

Label0  time_0(0)  value_0(0)
                value_0(1)
                ...
        time_1(0)  value_1(0)
                value_1(1)
                ..
         ...

Label1  time_0(0)  value_0(0)
                value_0(1)
                ...
        time_1(0)  value_1(0)
                value_1(1)
                ..
         ...   
Label2  time_0(0)  value_0(0)
                value_0(1)
                ...
        time_1(0)  value_1(0)
                value_1(1)
                ..
         ...   

2- first group_by label, then time_i and sum all values_i

The result should look like

Label1  time_0(0)  sum(value_0)
        time_1(1)  sum(value_1)
        time_2(2)  sum(value_2)
       
Label2  time_0(0)  sum(value_0)
        time_1(1)  sum(value_1)
        time_2(2)  sum(value_2)
         ...   

I tried different combinations with pd.merge, group_by, but without a success

here is an example with values enter image description here

jezrael

Use wide_to_long with aggregate sum:

df = (pd.wide_to_long(df.reset_index(), 
                     stubnames=['value','time'],
                     i=['index','Label'], j='tmp', sep='_')
        .groupby(['Label','time'])['value']
        .sum()
        .reset_index())
print (df)
     Label        time      value
0  EUR/CHF  2020-12-04 -51.260248
1  EUR/CHF  2020-12-10  98.202053
2  USD/CHF  2020-12-04   0.134488
3  USD/CHF  2020-12-10   4.510396
4  USD/NOK  2020-12-04   0.395785
5  USD/NOK  2020-12-10  -0.801768

EDIT:

Use:

print (df)
      Label veg__0_time  veg__0_value veg__1_time  veg__1_value
0   USD/CHF  2020-12-04      0.000000  2020-12-10      0.000000
1   USD/CHF  2020-12-04     -0.439058  2020-12-10      1.392752
2   USD/CHF  2020-12-04     -0.012020  2020-12-10      0.043742
3   USD/CHF  2020-12-04      0.000000  2020-12-10      0.000000
4   USD/CHF  2020-12-04      0.000000  2020-12-10      0.000000
5   USD/CHF  2020-12-04     -0.525791  2020-12-10      1.273146
6   USD/CHF  2020-12-04      1.306578  2020-12-10      1.115313
7   USD/CHF  2020-12-04     -0.195221  2020-12-10      0.685444
8   USD/NOK  2020-12-04      0.395785  2020-12-10     -0.801768
9   EUR/CHF  2020-12-04    -29.385792  2020-12-10     45.951600
10  EUR/CHF  2020-12-04    -21.874456  2020-12-10     52.250453

df = df.set_index('Label')
df.columns = df.columns.str.split('_', expand=True).droplevel([0,1])
print (df)
                  0                      1           
               time      value        time      value
Label                                                
USD/CHF  2020-12-04   0.000000  2020-12-10   0.000000
USD/CHF  2020-12-04  -0.439058  2020-12-10   1.392752
USD/CHF  2020-12-04  -0.012020  2020-12-10   0.043742
USD/CHF  2020-12-04   0.000000  2020-12-10   0.000000
USD/CHF  2020-12-04   0.000000  2020-12-10   0.000000
USD/CHF  2020-12-04  -0.525791  2020-12-10   1.273146
USD/CHF  2020-12-04   1.306578  2020-12-10   1.115313
USD/CHF  2020-12-04  -0.195221  2020-12-10   0.685444
USD/NOK  2020-12-04   0.395785  2020-12-10  -0.801768
EUR/CHF  2020-12-04 -29.385792  2020-12-10  45.951600
EUR/CHF  2020-12-04 -21.874456  2020-12-10  52.250453

df = df.stack(0).groupby(['Label','time'])['value'].sum().reset_index()
print (df)
     Label        time      value
0  EUR/CHF  2020-12-04 -51.260248
1  EUR/CHF  2020-12-10  98.202053
2  USD/CHF  2020-12-04   0.134488
3  USD/CHF  2020-12-10   4.510396
4  USD/NOK  2020-12-04   0.395785
5  USD/NOK  2020-12-10  -0.801768

EDIT1: Solution with rename:

def f(x):
    
    splitted = x.split('_')
    if len(splitted) > 1:
        return f'{splitted[-1]}_{splitted[-2]}'
    else:
        return x

df = df.rename(columns=f)
print (df)
      Label      time_0    value_0      time_1    value_1
0   USD/CHF  2020-12-04   0.000000  2020-12-10   0.000000
1   USD/CHF  2020-12-04  -0.439058  2020-12-10   1.392752
2   USD/CHF  2020-12-04  -0.012020  2020-12-10   0.043742
3   USD/CHF  2020-12-04   0.000000  2020-12-10   0.000000
4   USD/CHF  2020-12-04   0.000000  2020-12-10   0.000000
5   USD/CHF  2020-12-04  -0.525791  2020-12-10   1.273146
6   USD/CHF  2020-12-04   1.306578  2020-12-10   1.115313
7   USD/CHF  2020-12-04  -0.195221  2020-12-10   0.685444
8   USD/NOK  2020-12-04   0.395785  2020-12-10  -0.801768
9   EUR/CHF  2020-12-04 -29.385792  2020-12-10  45.951600
10  EUR/CHF  2020-12-04 -21.874456  2020-12-10  52.250453

df = (pd.wide_to_long(df.reset_index(), 
                     stubnames=['value','time'],
                     i=['index','Label'], j='tmp', sep='_')
        .groupby(['Label','time'])['value']
        .sum()
        .reset_index())
print (df)
     Label        time      value
0  EUR/CHF  2020-12-04 -51.260248
1  EUR/CHF  2020-12-10  98.202053
2  USD/CHF  2020-12-04   0.134488
3  USD/CHF  2020-12-10   4.510396
4  USD/NOK  2020-12-04   0.395785
5  USD/NOK  2020-12-10  -0.801768

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

Setting value of a column based on values of other columns in Pandas dataframe

Group by multiple columns and pivot and count values from other column in pandas

take value from lookup column conditional on other lookup column that is matching column names from other dataframe

Pandas: How to find minimal value in a column based on pair of values from 2 other columns, keeping the order of rows as in the initial dataframe?

group pandas DataFrame by one column and then get lists of values which occur in those categories from other column

Set values in a Pandas dataframe column with the value of another dataframe column where match between other two columns values (one with duplicates)

Group values and check if the values ​from another column cancel each other out in python pandas dataframe

How to populate values inside a new column based values from other columns in a dataframe in Pandas

From a Pandas Dataframe, return specific column values based on grouping and largest values of other columns

Fill new column in one dataframe with values from another, based on values in two other columns? (Python/Pandas)

Fill NA in PySpark DataFrame by group with values from Pandas lookup table

Change pandas dataframe column values based on other columns in dataframe

Set values of a column in pandas dataframe based on values in other columns

Change pandas dataframe column value depending on other columns values having three options

Pandas create new column with values from other columns, selected based on column value

Creating columns in a pandas dataframe based on a column value in other dataframe

Lookup values in other DataFrame based on two columns

How to replace a pandas DataFrame column with lookup values from a dictionary?

scala/spark - group dataframe and select values from other column as dataframe

How to create a new dataframe column using values and groupings from other rows and columns in pandas?

How to create a new column based on values from other columns in a Pandas DataFrame

Pandas Dataframe Create Column of Next Future Date from Unique values of two other columns, with Groupby

Create column of dictionaries with keys and values from other two columns in Pandas DataFrame

Creating new column based on conditions and values from other columns in a pandas dataframe

Changing a column type from a Pandas dataframe to datetime changes the values on the other columns

Create a pandas column based on a lookup value from another dataframe

I want to replace part of a column value in a dataframe using values from other columns

Changing values of one column based on the other three columns in pandas dataframe

Adding column to pandas DataFrame containing list of other columns' values