I have a df as follows:
State Country Pop GDP
1 MD USA 433100 1222
2 TX USA 99934123 324234
3 B.C. MEX 324599 5343
4 OAX MEX 43322 986
5 BC CAN 3431 643
6 NS CAN 8749 3535
I want to group by Country and find the max GDP and min GDP, returning that as well as new column names so that it looks something like this:
PoorState GDP Pop RichState GDP Pop
USA MD 1222 433100 TX 324234 99934123
MEX OAX 986 43322 B.C. 5343 324599
...
I've tried this to get the min and max for each state:
df.groupby('state').agg({'GDP':min, 'GDP':max})[['GDP','GDP']].reset_index()
but just the min works while the max doesn't.
State GDP GDP
MD 1222 1222
OAX 986 986
-- I know that I can do this to set the index: newdf = df.set_index('region')
I think I'm trying to figure out if I can do this all on one line.
We can do groupby
with rank
create the category then pivot
df['new'] = df.groupby('Country')['GDP'].rank().map({1:'Poor',2:'Rich'})
out = df.pivot(index='Country',columns = 'new').sort_index(level=1,axis=1)
out.columns = out.columns.map('_'.join)
out.reset_index(inplace=True)
out
Out[348]:
Country GDP_Poor Pop_Poor State_Poor GDP_Rich Pop_Rich State_Rich
0 CAN 643 3431 BC 3535 8749 NS
1 MEX 986 43322 OAX 5343 324599 B.C.
2 USA 1222 433100 MD 324234 99934123 TX
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments