Pandas - check if other columns have duplicates based on a different column

rj dj Published at Dev

20

rj dj

I have the following dataframe:

| col1 | col2 | col3 | col4 |
|------|------|------|------|
| a    | 1    | 2    | abc  |
| b    | 1    | 2    | abc  |
| c    | 3    | 2    | def  |

I want the rows which have duplicates based on col2, col3, col4 for unique values of col1.

In this case the output would be:

| col1 | col2 | col3 | col4 |
|------|------|------|------|
| a    | 1    | 2    | abc  |
| b    | 1    | 2    | abc  |

df.duplicated excluding col1 wont work since I need the col1 information to be contained in the result. I have millions of rows and further analysis would be difficult without this direct information. I can't set col1 as index as some other value needs to be set as index.

Is there a pythonic/pandaic way to achieve this?

BEN_YO

We can using filter

df.groupby(['col2','col3','col4']).filter(lambda x : (x['col1'].nunique()==x['col1'].count())&(x['col1'].nunique()>1))
Out[65]: 
  col1  col2  col3 col4
0    a     1     2  abc
1    b     1     2  abc

Also duplicated, first duplicate make sure you have duplicate value rows , second make sure you do not have only one row

df[df.duplicated(['col2','col3','col4'],keep=False)&~df.duplicated(['col1','col2','col3','col4'],keep=False)]
Out[70]: 
  col1  col2  col3 col4
0    a     1     2  abc
1    b     1     2  abc

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at2020-12-5

Comments

0 comments

Login to comment

Related

Find duplicates in a pandas dataframe column that have any different value in other columns

Pandas dataframe add new column based on if other columns have data or not

Removing duplicates based on value in other column in pandas

pandas groupby based on multi-columns, but keep the most repeated duplicates number on other column

How to drop duplicates in one column based on values in 2 other columns in DataFrame in Python Pandas?

New pandas columns based on different other columns, depending on a value of another column

Find duplicates where another column have different columns

How to remove duplicates in one column based on the value of other columns in psql

VBA EXCEL concatenating values from a column based on duplicates in other columns

Applying function to pandas column based on other columns

Sum a column in pandas DataFrame based on other columns

Creating a Pandas column based on values of other columns

Fill in column based on values in other columns pandas

Operate on columns based on other column contents in pandas

Pandas: Sum down a column based on other columns

Adding a column to a pandas dataframe based on other columns

Pandas: Multiply a column based on a different columns condition

Drop duplicates based on 2 columns if the value in another column is null - Pandas

pandas creating a new column based on other other columns

Grouping a column based on values on other columns to create new columns in pandas

pandas groupby column then create two other columns based on third column

Pandas convert column values into different columns based on another column

How to select rows that have column values that are duplicates in one column but different values in the other?

Getting latest values from different columns based on other column

delete rows that have duplicates based on a column dependening on number of NAs in different column

Check if multiple objects have same value of a field in java and remove duplicates based on other fields

Pandas: Splitting a column by delimiter and re-arrenging based on other columns

Create an aggregate column based on other columns in pandas dataframe

Creating columns in a pandas dataframe based on a column value in other dataframe

TOP Ranking

Article

HotTag

Archive