Python Pandas - filter pandas dataframe to get rows with minimum values in one column for each unique value in another column

Dexoryte

Here is a dummy example of the DF I'm working with ('ETC' represents several columns):

df = pd.DataFrame(data={'PlotCode':['A','A','A','A','B','B','B','C','C'],
                        'INVYR':[2000,2000,2000,2005,1990,2000,1990,2005,2001],
                        'ETC':['a','b','c','d','e','f','g','h','i']})

picture of df (sorry not enough reputation yet)

And here is what I want to end up with:

df1 = pd.DataFrame(data={'PlotCode':['A','A','A','B','B','C'],
                        'INVYR':[2000,2000,2000,1990,1990,2001],
                        'ETC':['a','b','c','e','g','i']})

picture of df1

NOTE: I want ALL rows with minimum 'INVYR' values for each 'PlotCode', not just one or else I'm assuming I could do something easier with drop_duplicates and sort.

So far, following the answer here Appending pandas dataframes generated in a for loop I've tried this with the following code:

df1 = []

for i in df['PlotCode'].unique():
    j = df[df['PlotCode']==i]
    k = j[j['INVYR']==j['INVYR'].min()]
    df1.append(k)

df1 = pd.concat(df1)

This code works but is very slow, my actual data contains some 40,000 different PlotCodes so this isn't a feasible solution. Does anyone know some smooth filtering way of doing this? I feel like I'm missing something very simple.

Thank you in advance!

Sander van den Oord

Try not to use for loops when using pandas, they are extremely slow in comparison to the vectorized operations that pandas has.

Solution 1:
Determine the minimum INVYR for every plotcode, using .groupby():

min_invyr_per_plotcode = df.groupby('PlotCode', as_index=False)['INVYR'].min()

And use pd.merge() to do an inner join between your orignal df with this minimum you just found:

result_df = pd.merge(
    df, 
    min_invyr_per_plotcode, 
    how='inner', 
    on=['PlotCode', 'INVYR'],
)

Solution 2:

Again, determine the minimum per group, but now add it as a column to your dataframe. This minimum per group gets added to every row by using .groupby().transform()

df['min_per_group'] = (df
    .groupby('PlotCode')['INVYR']
    .transform('min')
)

Now filter your dataframe where INVYR in a row is equal to the minimum of that group:

df[df['INVYR'] == df['min_per_group']]

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

Pandas, for each unique value in one column, get unique values in another column

Filter pandas dataframe rows by multiple column values

Python pandas dataframe: find max for each unique values of an another column

Take rows that share a value in one column and combine values from another column in pandas dataframe

Get unique values from one dataframe's column and use this to filter rows in another dataframe

Pandas Dataframe filter rows by only one column

Filter a Pandas dataframe by a condition and a minimum value in a column

Pandas Get List of Unique Values in Column A for each Unique Value in Column B

how to get the average of values for one column based on another column value in python (pandas, jupyter)

Python pandas dataframe check if values of one column is in another list

Python Pandas: checking value of one column into column of another dataframe

pandas iterate over one column over unique value and get another column's values

Get values from one column corresponding to the minimum value of another column for a subset of rows

Python/Pandas: Drop duplicate rows in dataframe, concatenate values in one column

Python Pandas: How to subtract values in two non-consecutive rows in a specific column of a dataframe from one another

How to filter values by Column Name and then extract the rows that have the same value to another CSV file? Python/Pandas

Grouping unique column values to get average of each unique value in pandas dataframe column

Pandas get unique values in one column based off of another column python

Subtracting minimum values of a certain pandas dataframe column based on another column

Python pandas: for each unique value in a column, find a minimum value in another column and subtract it from the value of another column

Assigning value to pandas dataframe values for unique values in another column

Pandas Dataframe: Find unique value from one column which has the largest number of unique values in another column

Pandas: filter one dataframe by multiple, simultaneous column values of another dataframe

Pandas - filter rows with same value in one column and multiple values in another column based on the existence of a value in the latter column

Pandas Dataframe duplicate rows with mean-based on the unique value in one column and so that each unique value have same number of rows

Determine number of unique values of one column for each value of another column

Pandas: check a sequence in one column for each unique value in another column

Pandas: get unique rows with unique value in column grouped by another column value

How to filter a pandas dataframe by unique column values