Pandas Dataframe Regrouping

henry

I have the following pandas dataframe, it shows the coordinate where there is a step in a tower, including z-min, and z-max:

enter image description here

import pandas as pd

names = ['T1', 'T2', 'T2', 'T1', 'T1']
coord = [0, 0, 5, 50, 70]


df = pd.DataFrame([names , coord ], index=['Name', 'Cord']).T  

Resulting in:

  Name Cord
0    T1   0
1    T2   0
2    T2   5
3    T1   50
4    T1   70

For every unique coordinate, I would like to regroup all towers whose z-max is smaller or equal to that coordinate. Hence:

     Name      Cord
0    T1,T2      0
1    T1,T2      5
2    T1         50
3    T1         70

I was wondering if you see a way of how to do it?

What I have done so far:

  1. group by unique coordinate

    import numpy as np
    
    g = df.groupby('Cord')['Name'].apply(lambda x: list(np.unique(x)))
    

I get:

Cord
0     [T1, T2]
5         [T2]
50        [T1]
70        [T1]

... but for the rest, it would be nice to get some hints. Thanks!

EDIT: The proposed answer does not work for the special case that the z-mins of the two towers are not the same:

import pandas as pd

names = ['T1', 'T2', 'T2', 'T1', 'T1']
coord = [0, 5, 10, 50, 70]


df = pd.DataFrame([names , coord ], index=['Name', 'Cord']).T
Quang Hoang

Let's try pivotting:

(df.pivot(index='Cord', columns='Name', values='Name')
   .bfill().apply(lambda x: list(x.dropna()), axis=1)
)

Output:

Cord
0     [T1, T2]
5     [T1, T2]
50        [T1]
70        [T1]
dtype: object

Explanation: First we pivot the table so it looks similar to your picture:

df.pivot(index='Cord', columns='Name', values='Name')

which gives:

Name   T1   T2
Cord          
0      T1   T2
5     NaN   T2
50     T1  NaN
70     T1  NaN

So you can see the towers at all Cord levels, except that they are bottom-up. Now we use bfill to fill the missing intermediate levels, e.g. 5 on T1.

Finally, we want to aggregate along the rows, which means apply with axis=1. The lambda function is self-explained (I hope).


Update: for the updated data, we need to bfill from the first non-nan values:

(df.pivot(index='Cord', columns='Name', values='Name')
   .apply(lambda x: x[x.notna().cumsum().ne(0)].bfill())
   .apply(lambda x: list(x.dropna()), axis=1)
)

Output:

Cord
0         [T1]
5     [T1, T2]
10    [T1, T2]
50        [T1]
70        [T1]
dtype: object

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related