When max(x, na.rm = TRUE)
is called with no non-NA
values, it returns -Inf
, with a warning. However, in certain cases, the summarise
function in dplyr
does not return the warning:
library(magrittr)
library(dplyr)
df1 <- data.frame(a = c("a","b"), b = c(NA,NA))
df1 %>% group_by(a) %>% summarise(x = max(b, na.rm = TRUE))
# Three warnings, as expected.
df2 <- data.frame(a = c("a","b"), b = c(1,NA))
df2 %>% group_by(a) %>% summarise(x = max(b, na.rm = TRUE))
# No warning. Unexpected.
Interestingly, if I rename the function, I get the warnings as expected:
# Pointer to same function.
stat <- max
df1 <- data.frame(a = c("a","b"), b = c(NA,NA))
df1 %>% group_by(a) %>% summarise(x = stat(b, na.rm = TRUE))
# Three warnings, as expected.
df2 <- data.frame(a = c("a","b"), b = c(1,NA))
df2 %>% group_by(a) %>% summarise(x = stat(b, na.rm = TRUE))
# Single warning, as expected.
Actually, I think it should be two warnings instead of three, because there are only two groups to summarise
. But I am not sure how the internal warning system works, so perhaps three warnings is as expected.
My question is: Why does summarise
not output the warning in specific cases, and if that is expected, why would a simple rename of the function change this behaviour?
My sessionInfo()
:
R version 3.3.2 (2016-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.5 LTS
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] dplyr_0.5.0.9000 magrittr_1.5
loaded via a namespace (and not attached):
[1] lazyeval_0.2.0.9000 R6_2.2.0 assertthat_0.1
[4] tools_3.3.2 DBI_0.5-1 tibble_1.2
[7] Rcpp_0.12.8
Although I am using the "dev" version of dplyr
, I have also tested it on the version available in CRAN, with the same results.
For max()
, a hybrid version is available that works much faster for a grouped data frame, because the entire evaluation can be carried out in C++ without R callback for each group. In dplyr 0.5.0, the hybrid version is triggered when all of the following conditions are met:
logical
constantSee the hybrid vignette for more detail.
The hybrid version of max()
differs in certain aspects from the R implementation:
-Inf
NA
vector will return NA
even with na.rm = TRUE
In your example, c(NA, NA)
is a vector of logical
, so dplyr falls back to "regular" evaluation with one R callback for each group. If you need the original behavior, simply use a wrapper or an alias; the hybrid evaluator will fall back to regular evaluation:
max_ <- max
data_frame(a = NA_real_) %>% summarise(a = max_(a, na.rm = TRUE))
## # A tibble: 1 × 1
## a
## <dbl>
## 1 -Inf
## Warning message:
## In max_(a, na.rm = TRUE) : no non-missing arguments to max; returning -Inf
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments