Deleting multiple rows depending on one row value

Deke Marquardt

I am trying to figure out a code where all rows of the same 'SCU_KEY' are deleted if the 'STATUS' == 0. So you will see that SCU_KEY -> 5 has a 0 in the status, so I want to delete all of the SCU_KEY's containing a 5. Here is a sample dataframe and the desired output.

Dataframe:

df = pd.DataFrame({'SCU_KEY': [3, 3, 3, 5, 5, 5, 5, 5, 16, 16, 16],
                   'STATUS' : [1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1]})

Desired output:

df_2 = pd.DataFrame({'SCU_KEY': [3, 3, 3, 16, 16, 16],
                     'STATUS' : [1, 1, 1, 1, 1, 1]})
HarryPlotter

Use groupby + filter

# filter out all 'SCU_KEY' groups 
# that have at least one 'STATUS' == 0
df2 = df.groupby('SCU_KEY').filter(lambda g: ~g['STATUS'].eq(0).any())

EDIT - Performance Test

Although I find this solution somehow more idiomatic, Corralien's solution is away faster if your DataFrame is large.

Setup

n = 500_000
max_groups = 20
df1 = pd.DataFrame({
    'SCU_KEY': rng.integers(max_groups, size=n),
    'STATUS': rng.integers(2, size=n)
})

Results

Here are the results for comparison

# Corralien's 
>>> %timeit df1[~df1['SCU_KEY'].isin(df1.loc[df1['STATUS'] == 0, 'SCU_KEY'])]

15.2 ms ± 1.51 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)

# My solution
>>> %timeit df1.groupby('SCU_KEY').filter(lambda g: ~g['STATUS'].eq(0).any())

59.4 ms ± 9.84 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

# Solution suggested by wwnde (see comments)
>>> %timeit df1[df1.groupby('SCU_KEY')['STATUS'].transform(lambda x: (x!=0).all())]

210 ms ± 12.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

Convert multiple rows into one row depending on unique values in another column

Merge multiple rows into one with more than one row value in a column

Deleting multiple rows based on row date

Progressively Calculate a Time Difference Between Multiple MySQL Rows depending on one of the fields in the row

Merge multiple rows with same value into one row in pandas

Pandas - split one row value and merge with multiple rows

Explode/Split one row into multiple rows based on column value

combine multiple rows result in single row based on one column value

SQL Server - Convert one row into a key value multiple rows

How to expand one row to multiple rows according to its value in Pandas

PHP+Mysql Display Value from multiple rows in one row

How to add multiple rows with the same value and ID in one row

combine multiple rows into one row based on column value

Multiple rows deleting in jtable with the same value

Check content of multiple columns of one row and add new column with value depending on contents

Split one row into multiple rows

Merging multiple rows into one row

Python multiple rows to one row

mySQL - Multiple rows into one row

Explode one row to multiple rows

Shrinking multiple rows to one row

Multiple rows in one row SQL

Transforming multiple rows to one row

Format multiple columns depending on the value of one

Update rows depending on the value in following rows in multiple columns

SQL : Split one row into two rows depending on column

PySpark - "compressing" multiple-row customers into one row, deleting blanks

Python Pandas Dataframe: Duplicating data in one row to multiple rows with the same value and merging rows

Is there a way to create columns out of rows depending on row's adjacent value?