summarise does not return warning from max when no non-NA values

nograpes

When max(x, na.rm = TRUE) is called with no non-NA values, it returns -Inf, with a warning. However, in certain cases, the summarise function in dplyr does not return the warning:

library(magrittr)
library(dplyr)

df1 <- data.frame(a = c("a","b"), b = c(NA,NA))
df1 %>% group_by(a) %>% summarise(x = max(b, na.rm = TRUE))
# Three warnings, as expected.

df2 <- data.frame(a = c("a","b"), b = c(1,NA))
df2 %>% group_by(a) %>% summarise(x = max(b, na.rm = TRUE))
# No warning. Unexpected.

Interestingly, if I rename the function, I get the warnings as expected:

# Pointer to same function.
stat <- max

df1 <- data.frame(a = c("a","b"), b = c(NA,NA))
df1 %>% group_by(a) %>% summarise(x = stat(b, na.rm = TRUE))
# Three warnings, as expected.

df2 <- data.frame(a = c("a","b"), b = c(1,NA))
df2 %>% group_by(a) %>% summarise(x = stat(b, na.rm = TRUE))
# Single warning, as expected.

Actually, I think it should be two warnings instead of three, because there are only two groups to summarise. But I am not sure how the internal warning system works, so perhaps three warnings is as expected.

My question is: Why does summarise not output the warning in specific cases, and if that is expected, why would a simple rename of the function change this behaviour?

My sessionInfo():

R version 3.3.2 (2016-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.5 LTS

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] dplyr_0.5.0.9000 magrittr_1.5

loaded via a namespace (and not attached):
[1] lazyeval_0.2.0.9000 R6_2.2.0            assertthat_0.1
[4] tools_3.3.2         DBI_0.5-1           tibble_1.2
[7] Rcpp_0.12.8

Although I am using the "dev" version of dplyr, I have also tested it on the version available in CRAN, with the same results.

krlmlr

For max(), a hybrid version is available that works much faster for a grouped data frame, because the entire evaluation can be carried out in C++ without R callback for each group. In dplyr 0.5.0, the hybrid version is triggered when all of the following conditions are met:

  • The first argument refers to a variable that exists in the data frame
  • The second argument is a logical constant

See the hybrid vignette for more detail.

The hybrid version of max() differs in certain aspects from the R implementation:

  • No warnings are raised for an empty vector, silently returning -Inf
  • An all-NA vector will return NA even with na.rm = TRUE

In your example, c(NA, NA) is a vector of logical, so dplyr falls back to "regular" evaluation with one R callback for each group. If you need the original behavior, simply use a wrapper or an alias; the hybrid evaluator will fall back to regular evaluation:

max_ <- max
data_frame(a = NA_real_) %>% summarise(a = max_(a, na.rm = TRUE))
## # A tibble: 1 × 1
##       a
##   <dbl>
## 1  -Inf
## Warning message:
## In max_(a, na.rm = TRUE) : no non-missing arguments to max; returning -Inf

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

Summarise coverage of variables by selecting non NA values

Why does any() return NA when no true values

How to return first element of a group excluding NA's when non-NA values exist

For Loop with If Conditions to Return Non-NA Values

Sum/return NA when all values are NA

summarise from dplyr data return

Return non-returning function from Promise results in Warning

Return values from the NA row of a factor column

Replace NA when last and next non-NA values are equal

summarise by group returns 0 instead of NA if all values are NA

Summarize data frame to return non-NA values along subsets

Summarise values from different columns and rows

Summarise_each for first non-NA value

Does WaitOnAddress return when values match or when values differ?

Warning when reading non-generic HashMap from InputStream

Calculate max from array with some non-numeric values

Why does the SQL SUM() function return a non-zero total when a group float values should add up to 0?

Function does not return expected value, when max value returns NULL

Replacing NA values for a variable in a dataframe with non-NA values from prior rows conditional on values of another variable

Selecting non `NA` values from duplicate rows with `data.table` -- when having more than one grouping variable

Counting agruped values: Include 0 values when using summarise(n())

Why does select max from a column with querybuilder return an array?

warning: ‘noreturn’ function does return

Is there a way to return non existent values when using groupby and union all?

How to extract non NA values in a list or dict from a pandas dataframe

Create a variable with only non NA values in R from multiple columns

Standard Deviation coming up NA when using summarise() function

summarise returning -inf when using na.rm = TRUE

Why does ggplot annotate throw this warning: In is.na(x) : is.na() applied to non-(list or vector) of type 'expression'