pandas DataFrame reshape by multiple column values

endangeredoxen

I'm trying to free myself of JMP for data analysis but cannot determine the pandas equivalent of JMP's Split Columns function. I'm starting with the following DataFrame:

In [1]: df = pd.DataFrame({'Level0': [0,0,0,0,0,0,1,1,1,1,1,1], 'Level1': [0,1,0,1,0,1,0,1,0,1,0,1], 'Vals': [1,3,2,4,1,6,7,5,3,3,2,8]})
In [2]: df
Out[2]:
    Level0  Level1  Vals
0        0       0     1
1        0       1     3
2        0       0     2
3        0       1     4
4        0       0     1
5        0       1     6
6        1       0     7
7        1       1     5
8        1       0     3
9        1       1     3
10       1       0     2
11       1       1     8

I can handle some of the output scenarios of JMP's function using the pivot_table function, but I'm stumped on the case where the Vals column is split by unique combinations of Level0 and Level1 to give the following output:

Level0   0       1
Level1   0   1   0   1
0        1   3   7   5
1        2   4   3   3
2        1   6   2   8

I tried pd.pivot_table(df, values='Vals', columns=['Level0', 'Level1']) but this gives mean values for the different combinations:

Level0  Level1
0       0         1.333333
        1         4.333333
1       0         4.000000
        1         5.333333

I also tried pd.pivot_table(df, values='Vals', index=df.index, columns=['Level0', 'Level1'] which gets me the column headers I want but doesn't work because it forces the output to have the same number of rows as the original so the output has a lot of NaN values:

Level0   0       1
Level1   0   1   0   1
0        1 NaN NaN NaN
1      NaN   3 NaN NaN
2        2 NaN NaN NaN
3      NaN   4 NaN NaN
4        1 NaN NaN NaN
5      NaN   6 NaN NaN
6      NaN NaN   7 NaN
7      NaN NaN NaN   5
8      NaN NaN   3 NaN
9      NaN NaN NaN   3
10     NaN NaN   2 NaN
11     NaN NaN NaN   8

Any suggestions?

ayhan

It's a bit of workaround, but you can do:

df.pivot_table(index=df.groupby(['Level0', 'Level1']).cumcount(), 
               columns=['Level0', 'Level1'], values='Vals', aggfunc='first')
Out: 
Level0  0     1   
Level1  0  1  0  1
0       1  3  7  5
1       2  4  3  3
2       1  6  2  8

The idea here is that the index of the output is not readily available in the original DataFrame. You can get it with the following:

df.groupby(['Level0', 'Level1']).cumcount()
Out: 
0     0
1     0
2     1
3     1
4     2
5     2
6     0
7     0
8     1
9     1
10    2
11    2
dtype: int64

Now if you pass this as the index of the pivot_table, an arbitrary aggfunc (mean, min, max, first or last) should work for you as those index-column pairs have only one entry.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

How to reshape pandas dataframe by column values?

Reshape Pandas Dataframe with multiple column groups

Python pandas dataframe reshape long to wide, multiple columns and constant values

Pandas reshape dataframe values as columns

Reshape dataframe with multiple values columns

Reshape pandas dataframe for multiple columns

Pandas Dataframe Reshape with Multiple Index

reshape a pandas dataframe with multiple columns

Filter pandas dataframe rows by multiple column values

Pandas dataframe replace values on multiple column conditions

Adding multiple constant values in a pandas dataframe column

reshape Pandas dataframe by appending column to column

Reshape the Pandas dataframe based on a single column

Pandas reshape Dataframe based on column value

Reshape Pandas dataframe based on values in two columns

Pandas: filter one dataframe by multiple, simultaneous column values of another dataframe

Pandas DataFrame stack multiple column values into single column

replace multiple values in a column and keep other values unchanged in pandas dataframe

Changing values in multiple columns of a pandas DataFrame using known column values

Pandas reshape extracting multiple values from colname

remapping multiple column values with multiple dictionary in dataframe using python pandas

Merge multiple rows in pandas Dataframe based on multiple column values

Pandas - melt, stack, reshape or MultiIndex dataframe columns based on multiple attributes from the column name

How to estimate count for Pandas dataframe column values based on multiple conditions?

Python 3/Pandas Dataframe Splitting a column in multiple columns with binary values

Pandas: How to match the multiple column values with another dataframe and generate output

Pandas Dataframe sort_values() multiple column with custom key

Updating MULTIPLE values in a column in pandas dataframe using ffill (or other methods)

Split a text(with names and values) column into multiple columns in Pandas DataFrame