Finding a value between other values in a Pandas Dataframe for specific columns

mylesmoose

I'm trying to find a way to merge two dataframes. Each data frame uses two columns to create a unique identifier. In the master data frame the data is assigned for a given range of values, in the category data frame the data is assigned for a single value. What I'd like to do is get the type value from the master data frame for each entry in the category data frame.

It's hard to explain so here's a simple example:

master = {'ID1':['a','a','b','b','b','b','b','c','c'],
       'ID2':['d','d','d','d','d','e','e','d','e'],
       'RangeTop':[0,4,0,3,10,0,5,0,0],
       'RangeBot':[4,13,3,10,21,5,11,8,15],
       'Type':['z','y','x','w','v','u','t','s','r']
       }
category = {'ID1':['a','a','b','b','c','c'],
       'ID2':['d','d','d','e','d','e'],
       'Value':[3,8,11,7,6,13]
       }
df = pd.DataFrame(master, columns = ['ID1', 'ID2', 'RangeTop','RangeBot','Type'])
df2 = pd.DataFrame(category, columns = ['ID1', 'ID2', 'Value'])
df['Unique'] = df['ID1']+df['ID2']
df2['Unique'] = df2['ID1']+df2['ID2']
print(df, '\n', df2)

The output looks like this:

master
   ID1 ID2  RangeTop  RangeBot Type Unique
0   a   d         0         4    z     ad
1   a   d         4        13    y     ad
2   b   d         0         3    x     bd
3   b   d         3        10    w     bd
4   b   d        10        21    v     bd
5   b   e         0         5    u     be
6   b   e         5        11    t     be
7   c   d         0         8    s     cd
8   c   e         0        15    r     ce 
 category
   ID1 ID2  Value Unique
0   a   d      3     ad
1   a   d      8     ad
2   b   d     11     bd
3   b   e      7     be
4   c   d      6     cd
5   c   e     13     ce

I made up the Unique column because I thought maybe I could use the between method or the where method to find where the value is between RangeTop and RangeBot for certain Unique identifiers but it didn't work. What I want it to look like is:

 category
   ID1 ID2  Value Unique Type
0   a   d      3     ad   z
1   a   d      8     ad   y
2   b   d     11     bd   v
3   b   e      7     be   t
4   c   d      6     cd   s
5   c   e     13     ce   r
ALollz

Because df has complete and non-overlapping ranges you can do this with pd.merge_asof to match exactly on ID then to get the value between the ranges. We'll need an additional < check after the merge to NaN any "Value"s in df2 that are above the highest bin edge in df. (direction='backward' ensures "Value"s below the lowest bin edge are NaN by default).

df2 = pd.merge_asof(df2.sort_values('Value'),      # Requires sorting
                    df.sort_values('RangeTop'),    # Requires sorting
                    by=['ID1', 'ID2'],             # Exact matching on
                    left_on='Value',         
                    right_on='RangeTop',
                    direction='backward',          # Requires Value >= RangeTop
                    allow_exact_matches=True       # [RangeTop, RangeBot) closure
                   )

# This enforces Value < RangeBot.
df2['Type'] = df2['Type'].where(df2['Value'].lt(df2['RangeBot']))
# No longer need these cols
df2 = df2.drop(columns=['RangeTop', 'RangeBot'])

print(df2)
# Note row order has changed due to `sort`ing

  ID1 ID2  Value Type
0   a   d      3    z
1   c   d      6    s
2   b   e      7    t
3   a   d      8    y
4   b   d     11    v
5   c   e     13    r

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

Pandas DataFrame - Fill NaNs of columns based on values of other columns

Finding miminum values by columns for a pandas DataFrame containing NaN elements

Setting value of a column based on values of other columns in Pandas dataframe

Pandas Dataframe - Group by column value and lookup values from other columns

Copy values between pandas dataframe columns

Calculates new columns based on other columns' values in python pandas dataframe

finding value in pandas dataframe

Select columns with specific values in pandas DataFrame

Map values for categories in pandas columns based on other dataframe columns

python- Finding difference between values in a dataframe in Pandas

Pandas finding occurrences of a specific value across multiple columns

Sum of columns based on range of values of other columns in a Pandas dataframe

Is there a way to filter a dataframe based on a specific value but also keep all other values for the unique identifier using pandas?

Finding if values in multiple columns greater than constant in pandas Dataframe

Finding a timedelta in pandas dataframe based upon specific values in one column

Operations on pandas dataframe between values of specific columns / rows

Count of values between other values in a pandas DataFrame

From a Pandas Dataframe, return specific column values based on grouping and largest values of other columns

Find values of a DataFrame that are between the values of two columns on other DataFrame

Finding parent child relationsip between columns in a pandas dataframe

Replace a value in the dataframe if it falls in between specific values

How to create a new row in pandas dataframe by dividing values in a specific column between two rows and keeping other columns intact?

Change pandas dataframe column value depending on other columns values having three options

Finding percentage proportion using specific columns and rows of pandas dataframe

Finding a mismatch between two correlated columns in pandas dataframe

Finding Value between two numbers in pandas dataframe

How can I query a column of a dataframe on a specific value and get the values of two other columns corresponding to that value

Find the sum of a column values based on unique cases,max value and specific value of other columns in the dataframe

Set values in a Pandas dataframe column with the value of another dataframe column where match between other two columns values (one with duplicates)