Here is a dummy example of the DF I'm working with ('ETC' represents several columns):
df = pd.DataFrame(data={'PlotCode':['A','A','A','A','B','B','B','C','C'],
'INVYR':[2000,2000,2000,2005,1990,2000,1990,2005,2001],
'ETC':['a','b','c','d','e','f','g','h','i']})
picture of df (sorry not enough reputation yet)
And here is what I want to end up with:
df1 = pd.DataFrame(data={'PlotCode':['A','A','A','B','B','C'],
'INVYR':[2000,2000,2000,1990,1990,2001],
'ETC':['a','b','c','e','g','i']})
NOTE: I want ALL rows with minimum 'INVYR' values for each 'PlotCode', not just one or else I'm assuming I could do something easier with drop_duplicates and sort.
So far, following the answer here Appending pandas dataframes generated in a for loop I've tried this with the following code:
df1 = []
for i in df['PlotCode'].unique():
j = df[df['PlotCode']==i]
k = j[j['INVYR']==j['INVYR'].min()]
df1.append(k)
df1 = pd.concat(df1)
This code works but is very slow, my actual data contains some 40,000 different PlotCodes so this isn't a feasible solution. Does anyone know some smooth filtering way of doing this? I feel like I'm missing something very simple.
Thank you in advance!
Try not to use for loops when using pandas, they are extremely slow in comparison to the vectorized operations that pandas has.
Solution 1:
Determine the minimum INVYR for every plotcode, using .groupby():
min_invyr_per_plotcode = df.groupby('PlotCode', as_index=False)['INVYR'].min()
And use pd.merge() to do an inner join between your orignal df with this minimum you just found:
result_df = pd.merge(
df,
min_invyr_per_plotcode,
how='inner',
on=['PlotCode', 'INVYR'],
)
Solution 2:
Again, determine the minimum per group, but now add it as a column to your dataframe. This minimum per group gets added to every row by using .groupby().transform()
df['min_per_group'] = (df
.groupby('PlotCode')['INVYR']
.transform('min')
)
Now filter your dataframe where INVYR in a row is equal to the minimum of that group:
df[df['INVYR'] == df['min_per_group']]
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments