For each row check if value in one column exists in two other columns

Mat Published at Dev

81

mat

Assume we have the following data frame:

df <- data.frame(X1 = 1:5, X2 = 6:10, X3 = c(6, 2, 3, 0, 2))

  X1 X2 X3
1  1  6  6
2  2  7  2
3  3  8  3
4  4  9  0
5  5 10  2

I would like to add a new column (X4) made of logical values. For each row: if X3 is equal to X1 or X2, then X4 should be TRUE, otherwise FALSE.

I tried:

mutate(df, X4 = X3 %in% c(X2, X1))

  X1 X2 X3    X4
1  1  6  6  TRUE # OK
2  2  7  2  TRUE # OK
3  3  8  3  TRUE # OK
4  4  9  0 FALSE # OK
5  5 10  2  TRUE # expected to be FALSE

Most importantly, my real df is very large, so I would like to avoid using for loops. I would privilege the shortest (less amount of code) and fastest solution.

akrun

We can use Reduce

Reduce(`|`, lapply(df[1:2], `==`, df[,3]))
#[1]  TRUE  TRUE  TRUE FALSE FALSE

Benchmarking

On a bigger data makes more sense

library(microbenchmark)
set.seed(24)
df <- data.frame(X1= sample(1:5, 1e6, replace=TRUE), X2 = sample(1:10, 1e6, replace=TRUE),
       X3 = sample(1:10, 1e6, replace=TRUE))

f2 <- function(df) Reduce(`|`, lapply(df[1:2], `==`, df[,3]))
f3 <- function(df) with(df, X3==X1 | X3==X2)
microbenchmark(f1(df), f2(df), f3(df))
#Unit: milliseconds
#   expr         min         lq       mean     median         uq      max neval

# f2(df)    8.191218   10.83333   23.28081   16.42744   22.26866  143.025   100
# f3(df)    8.154506   10.58878   19.17879   11.49179   22.41255  144.510   100

The apply is slower as I thought, but the Reduce is not as slow..

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at2020-10-25

Comments

0 comments

Login to comment

Related

Check there there exists only one null in each row, between two columns MySql (version 8.0.13)

How do I check if each row in a column falls between a range of two other columns?

Pandas - check if a value exists in multiple columns for each row

returning only one row for each value of a column along with other values in different columns

Pandas dataframe check if a value exists in multiple columns for one row

How to combine two columns into one in R, so that each value in the second column becomes every other value in the first column?

Pivoting row values in one column based on values in two other columns

Check each value in one column with each value of other column in one dataframe

Replicating value from one column into other columns based on row content

replacing the value of one column conditional on two other columns in pandas

R - assigning value to one column based on a comparison of two other columns

Check if the value exists in any other columns with Tidyverse

Check if one row exists in two tables

Check whether value exists in column for each group

Melting two sets of two columns into two rows (one row for each column in the set)

Python - Sum a column where each row is less than two other columns

Merge Row values from different columns to one column on top of each other: MySQL

unpivot columns, sum of value into one row group by other column based on start_month column that is bound to change

How to convert rows that contain same value for one column but different for other columns into one single row using R?

how to check occurance of string across two or more columns for each row and assign the final column with 0

Python: Assign missing value to rows in one column if any row is missing value in other columns

How to check one rows value present in any of the other column row value

Filtering two columns: keep all the rows associated to one ID if exists a value in the second column

How to select only one row for each unique id, with other columns having certain condition value

SQL: How to check a value in column is equal to the sum of any combination of the other columns in a row

Using Python how can I merge two columns and overwrite data from one column only if data in other column exists?

Selecting if all row ids exists in other table with column check

How can I check if a value of a column in a dataframe in r corresponds to a single occurence of another columns for each row of the dataframe?

Check if a value in one column in one dataframe is within the range between values in two columns in another dataframe

TOP Ranking

Article

HotTag

Archive