Assume we have the following data frame:
df <- data.frame(X1 = 1:5, X2 = 6:10, X3 = c(6, 2, 3, 0, 2))
X1 X2 X3
1 1 6 6
2 2 7 2
3 3 8 3
4 4 9 0
5 5 10 2
I would like to add a new column (X4
) made of logical values. For each row: if X3
is equal to X1
or X2
, then X4
should be TRUE
, otherwise FALSE
.
I tried:
mutate(df, X4 = X3 %in% c(X2, X1))
X1 X2 X3 X4
1 1 6 6 TRUE # OK
2 2 7 2 TRUE # OK
3 3 8 3 TRUE # OK
4 4 9 0 FALSE # OK
5 5 10 2 TRUE # expected to be FALSE
Most importantly, my real df
is very large, so I would like to avoid using for loops. I would privilege the shortest (less amount of code) and fastest solution.
We can use Reduce
Reduce(`|`, lapply(df[1:2], `==`, df[,3]))
#[1] TRUE TRUE TRUE FALSE FALSE
On a bigger data makes more sense
library(microbenchmark)
set.seed(24)
df <- data.frame(X1= sample(1:5, 1e6, replace=TRUE), X2 = sample(1:10, 1e6, replace=TRUE),
X3 = sample(1:10, 1e6, replace=TRUE))
f2 <- function(df) Reduce(`|`, lapply(df[1:2], `==`, df[,3]))
f3 <- function(df) with(df, X3==X1 | X3==X2)
microbenchmark(f1(df), f2(df), f3(df))
#Unit: milliseconds
# expr min lq mean median uq max neval
# f2(df) 8.191218 10.83333 23.28081 16.42744 22.26866 143.025 100
# f3(df) 8.154506 10.58878 19.17879 11.49179 22.41255 144.510 100
The apply
is slower as I thought, but the Reduce
is not as slow..
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments