Pandas: How return all rows if column string contains at least a certain number of strings from a list?

SantoshGupta7

Say that I have a list of strings, such as

listStrings = [ 'cat', 'bat', 'hat', 'dad', 'look', 'ball', 'hero', 'up']

Is there a way would return all rows if a particular column contains 3 or more of the strings from the list?

For example

If the column contained 'My dad is a hero for saving the cat'

Then the row would be returned.

But if the column only contained 'the cat and bat teamed up to find some food'

That row wouldn't be returned.

The only way I can think of is to get every combination of 3 from the list of strings, and use AND statements. e.g. 'cat' AND 'bat' AND 'hat'.

But this doesn't seem computationally efficient nor pythonic.

Is there a more efficient, compact way to do this?

Edit

Here is a pandas example

import pandas as pd 

listStrings = [ 'cat', 'bat', 'hat', 'dad', 'look', 'ball', 'hero', 'up']

df = pd.DataFrame(['test1', 'test2', 'test3'], ['My dad is a hero for saving the cat', 'the cat and bat teamed up to find some food', 'The dog found a bowl'])
df.head()


0
My dad is a hero for saving the cat test1
the cat and bat teamed up to find some food test2
The dog found a bowl    test3

So using the listStrings, I would like row 1 returned, but not row 2 or row 3.

Mykola Zotko

You can use set itersection:

import pandas as pd 

listStrings =  {'A', 'B'}    
df = pd.DataFrame({'text': ['A B', 'B C', 'C D']})

df = df.loc[df.text.apply(lambda x: len(listStrings.intersection(x.split())) >= 2)]
print(df)

Output:

  text
0  A B

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

Check if pandas column contains all elements from a list

Dropping rows with contain of a list of certain strings in Pandas

Pandas dataframe select rows where a list-column contains any of a list of strings

How to drop rows from pandas data frame that contains a particular string in a particular column?

Pandas: Remove all rows where any of the column contains a certain substring

How to filter Pandas Dataframe rows which contains any string from a list?

Pandas Iterate through rows, compare column value with string in a list, return a value from another column

Python Pandas how to update a column if another column contains a certain string

Count number of DataFrame rows which contains all of the values from the list using pandas

How to find all rows contains certain substring, Python Pandas

How can I find 5 consecutive rows in pandas Dataframe where a value of a certain column is at least 0.5

return a list if the column contains a string

Check if a string contains at least two of the strings in a list

How To Remove $ Character From All Rows (Which Contains The $) in a Column

Pandas: checking if a string contains at least two words from a list

Remove certain strings from list of strings as column in pandas.DataFrame

How to return certain rows in pandas based on a calculation across a column

How to extract entire rows from pandas data frame, if a column's string value contains a specific pattern

Check if at least one column contains a string in pandas

Pandas Drop rows that do not contains a list of strings

How to check if a certain df['column'] contains a word from a list Python?

Using pandas, how can I sort a table on all values that contains a string element from a list of string elements?

Removing a rows from pandas data frame if one of its cell contains list of all caps string

pandas: How to remove characters in a string contains parentheses and save it as a list of strings

How to return all sub-list from list that contains lists

How to return all rows where an array column contains a given array?

Removing all rows that are not a number in a certain column

Return all substring values from a string column when contains a small list of substring

How to check if a pandas data frame column contains any value from a list and return that value