How can I filter rows which contains 2 or more words from another column?

Alex

I would like to filter rows that contain 2 or more words located in another column.

I have a dataframe like this:

    df <- data.frame(name1 = c("Carlos Lopez Rey", "Monica Naranjo Garcia", "Antonio Perez Reverte", "Alejandro Martinez Amor", "Iñigo Muruzabal"), 
                     name2 = c("Lopez, Carlos", "Monica de Naranjo", "Garcia, Antonio", "Alejandro Martinez de Amor", "Muruzabal, Javier"))

And I would like to create a condition that filters rows that contain 2 or more same words in the first column (name1) and in the second column (name2). The result I would like to have is:

name1 name2
Carlos Lopez Rey Lopez, Carlos
Monica Naranjo Garcia Monica de Naranjo
Alejandro Martinez Amor Alejandro Martinez de Amor

* Notice that "Antonio Perez Reverte" and " Iñigo Muruzabal" are not filtered because the first column only matches 1 word with the second column.

Ronak Shah

Split the string on words, find common words using length(intersect(...)) and select only rows that have at least 2 words in common.

result <- subset(df, mapply(function(x, y) length(intersect(x, y)), 
                     strsplit(name1, ',|\\s+'), strsplit(name2, ',|\\s+')) >= 2)

result

#                    name1                      name2
#1        Carlos Lopez Rey              Lopez, Carlos
#2   Monica Naranjo Garcia          Monica de Naranjo
#4 Alejandro Martinez Amor Alejandro Martinez de Amor

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

Filter a list of strings which contains one or more strings from another list with Java 8 streams

How can I use a 2D array of boolean rows to filter another 2D array?

How can I fill empty table rows which references another table's column ID?

How can I filter for pandas columns or rows based on values of another column?

How can I average every 5 rows specific column and select last data from another column in Pandas

How can I filter a list in Flutter based on the words it contains?

How can I filter one column by name, then take value from another column?

How can I make Pandas convert a column which contains NaT from timedelta to datetime?

How can I find which column contains the same value as another specified column in R?

How can I filter rows if one array contains all values from another array using BigQuery?

How can i order my rows from (first) a column and then (secondly) another column which is a specific starting time?

How To Remove $ Character From All Rows (Which Contains The $) in a Column

How can I capture the number of rows inserted into a Redshift table which contains an identity column?

How can I select rows from database where column Name contains exactly 5 digits

keyword analysis: to return rows for which a description column contain one or more words that are in another column in another table

How can I filter spark Dataframe according to the value that column contains?

How can I use a LINQ statement to filter out items that matches one of the words from another List?

In Oracle SQL how can i find all values in one column for which in another column exist more than one distinct value

How do I filter only those rows which contains any of the values from a given list of tags

How can I delete rows if a column contains a certain value?

How can I filter rows which contains less than 3 whitespaces in a column? (R)

How can I select all the rows which do not share a column value with another row which is null?

Access rows with string in dataframe column, which contain 2 or more spaces between words using Pandas

How can I subtract values of one column rows from another column row which is preceding on the basis of Year period?

How can a I drop duplicate rows for a dataframe based on the filter or condition of another column?

How can I apply Conditional Formatting on a column if the cell (partially) contains text from a list (another column) in excel?

How can I create a new column in a pandas data frame by extracting words from sentences in another column?

In Excel, how can I filter, pull, and sum values from a column if their rows match data in another table and are within a specific date range?

How can I filter an rows in column of ArrayType(StringType) against items in another column in a separate dataframe using pyspark?