I'm trying to find a way to merge two dataframes. Each data frame uses two columns to create a unique identifier. In the master data frame the data is assigned for a given range of values, in the category data frame the data is assigned for a single value. What I'd like to do is get the type value from the master data frame for each entry in the category data frame.
It's hard to explain so here's a simple example:
master = {'ID1':['a','a','b','b','b','b','b','c','c'],
'ID2':['d','d','d','d','d','e','e','d','e'],
'RangeTop':[0,4,0,3,10,0,5,0,0],
'RangeBot':[4,13,3,10,21,5,11,8,15],
'Type':['z','y','x','w','v','u','t','s','r']
}
category = {'ID1':['a','a','b','b','c','c'],
'ID2':['d','d','d','e','d','e'],
'Value':[3,8,11,7,6,13]
}
df = pd.DataFrame(master, columns = ['ID1', 'ID2', 'RangeTop','RangeBot','Type'])
df2 = pd.DataFrame(category, columns = ['ID1', 'ID2', 'Value'])
df['Unique'] = df['ID1']+df['ID2']
df2['Unique'] = df2['ID1']+df2['ID2']
print(df, '\n', df2)
The output looks like this:
master
ID1 ID2 RangeTop RangeBot Type Unique
0 a d 0 4 z ad
1 a d 4 13 y ad
2 b d 0 3 x bd
3 b d 3 10 w bd
4 b d 10 21 v bd
5 b e 0 5 u be
6 b e 5 11 t be
7 c d 0 8 s cd
8 c e 0 15 r ce
category
ID1 ID2 Value Unique
0 a d 3 ad
1 a d 8 ad
2 b d 11 bd
3 b e 7 be
4 c d 6 cd
5 c e 13 ce
I made up the Unique column because I thought maybe I could use the between method or the where method to find where the value is between RangeTop and RangeBot for certain Unique identifiers but it didn't work. What I want it to look like is:
category
ID1 ID2 Value Unique Type
0 a d 3 ad z
1 a d 8 ad y
2 b d 11 bd v
3 b e 7 be t
4 c d 6 cd s
5 c e 13 ce r
Because df
has complete and non-overlapping ranges you can do this with pd.merge_asof
to match exactly on ID then to get the value between the ranges. We'll need an additional <
check after the merge to NaN
any "Value"s in df2
that are above the highest bin edge in df
. (direction='backward'
ensures "Value"s below the lowest bin edge are NaN
by default).
df2 = pd.merge_asof(df2.sort_values('Value'), # Requires sorting
df.sort_values('RangeTop'), # Requires sorting
by=['ID1', 'ID2'], # Exact matching on
left_on='Value',
right_on='RangeTop',
direction='backward', # Requires Value >= RangeTop
allow_exact_matches=True # [RangeTop, RangeBot) closure
)
# This enforces Value < RangeBot.
df2['Type'] = df2['Type'].where(df2['Value'].lt(df2['RangeBot']))
# No longer need these cols
df2 = df2.drop(columns=['RangeTop', 'RangeBot'])
print(df2)
# Note row order has changed due to `sort`ing
ID1 ID2 Value Type
0 a d 3 z
1 c d 6 s
2 b e 7 t
3 a d 8 y
4 b d 11 v
5 c e 13 r
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments