Writing a custom function that works inside dplyr::mutate()

jzadra

I'm struggling to write a function that works inside dplyr::mutate().

Since rowwise() %>% sum() is quite slow on large datasets, the suggested alternative is to return back to baseR. I'm hoping to streamline this process as below, but am having trouble passing the data within the mutate function.

require(tidyverse)
#> Loading required package: tidyverse
#I'd like to write a function that works inside mutate and replaces the rowSums(select()).
cars <- as_tibble(cars)

cars %>% 
  mutate(sum = rowSums(select(., speed, dist), na.rm = T))
#> # A tibble: 50 x 3
#>    speed  dist   sum
#>    <dbl> <dbl> <dbl>
#>  1    4.    2.    6.
#>  2    4.   10.   14.
#>  3    7.    4.   11.
#>  4    7.   22.   29.
#>  5    8.   16.   24.
#>  6    9.   10.   19.
#>  7   10.   18.   28.
#>  8   10.   26.   36.
#>  9   10.   34.   44.
#> 10   11.   17.   28.
#> # ... with 40 more rows

#Here is my first attempt.
rowwise_sum <- function(data, ..., na.rm = FALSE) {
  columns <- rlang::enquos(...)

  data %>% 
    select(!!!columns) %>% 
    rowSums(na.rm = na.rm)
}

#Doesnt' work as expected:
cars %>% mutate(sum = rowwise_sum(speed, dist, na.rm = T))
#> Error in mutate_impl(.data, dots): Evaluation error: no applicable method for 'select_' applied to an object of class "c('double', 'numeric')".

#But alone it is creating a vector.
cars %>% rowwise_sum(speed, dist, na.rm = T)
#>  [1]   6  14  11  29  24  19  28  36  44  28  39  26  32  36  40  39  47
#> [18]  47  59  40  50  74  94  35  41  69  48  56  49  57  67  60  74  94
#> [35] 102  55  65  87  52  68  72  76  84  88  77  94 116 117 144 110

#Appears to not be getting the data passed.  Specifying with a dot works.
cars %>% mutate(sum = rowwise_sum(., speed, dist, na.rm = T))
#> # A tibble: 50 x 3
#>    speed  dist   sum
#>    <dbl> <dbl> <dbl>
#>  1    4.    2.    6.
#>  2    4.   10.   14.
#>  3    7.    4.   11.
#>  4    7.   22.   29.
#>  5    8.   16.   24.
#>  6    9.   10.   19.
#>  7   10.   18.   28.
#>  8   10.   26.   36.
#>  9   10.   34.   44.
#> 10   11.   17.   28.
#> # ... with 40 more rows

So the question becomes how to get around this need of including a dot every time by instead passing the data inside the function?

rowwise_sum2 <- function(data, ..., na.rm = FALSE) {
  columns <- rlang::enquos(...)

  data %>% 
    select(!!!columns) %>% 
    rowSums(., na.rm = na.rm)
}

#Same error
cars %>% mutate(sum = rowwise_sum2(speed, dist, na.rm = T))
#> Error in mutate_impl(.data, dots): Evaluation error: no applicable method for 'select_' applied to an object of class "c('double', 'numeric')".

#Same result
cars %>% rowwise_sum2(speed, dist, na.rm = T)
#>  [1]   6  14  11  29  24  19  28  36  44  28  39  26  32  36  40  39  47
#> [18]  47  59  40  50  74  94  35  41  69  48  56  49  57  67  60  74  94
#> [35] 102  55  65  87  52  68  72  76  84  88  77  94 116 117 144 110

#Same result
cars %>% mutate(sum = rowwise_sum2(., speed, dist, na.rm = T))
#> # A tibble: 50 x 3
#>    speed  dist   sum
#>    <dbl> <dbl> <dbl>
#>  1    4.    2.    6.
#>  2    4.   10.   14.
#>  3    7.    4.   11.
#>  4    7.   22.   29.
#>  5    8.   16.   24.
#>  6    9.   10.   19.
#>  7   10.   18.   28.
#>  8   10.   26.   36.
#>  9   10.   34.   44.
#> 10   11.   17.   28.
#> # ... with 40 more rows

Created on 2018-05-22 by the reprex package (v0.2.0).


Answer from akrun below (please upvote):

To paraphrase: just ditch the mutate() and do everything in the new function.

Here is my final function as an update to his which also allows naming the sum value column if desired.

rowwise_sum <- function(data, ..., sum_col = "sum", na.rm = FALSE) {

  columns <- rlang::enquos(...)

  data %>%
    select(!!! columns) %>%
    transmute(!!sum_col := rowSums(., na.rm = na.rm)) %>%
    bind_cols(data, .)
}
akrun

We can place the ... at the end

rowwise_sum <- function(data, na.rm = FALSE,...) {
  columns <- rlang::enquos(...)
  data %>%
     select(!!!columns) %>%
     rowSums(na.rm = na.rm)
}

cars %>% 
     mutate(sum = rowwise_sum(., na.rm = TRUE, speed, dist))
# A tibble: 50 x 3
#   speed  dist   sum
#   <dbl> <dbl> <dbl>
# 1     4     2     6
# 2     4    10    14
# 3     7     4    11
# 4     7    22    29
# 5     8    16    24
# 6     9    10    19
# 7    10    18    28
# 8    10    26    36
# 9    10    34    44
#10    11    17    28
# ... with 40 more rows

It would also work without changing the position of ... (though in general it is recommended). Here the main issue is the data (which is .) is not specified in the argument list within in mutate.


It would be easier to create the whole flow in the function instead of doing a part

rowwise_sum2 <- function(data, na.rm = FALSE, ...) {
  columns <- rlang::enquos(...)
  data %>%
      select(!!! columns) %>%
      transmute(sum = rowSums(., na.rm = TRUE)) %>%
      bind_cols(data, .)

}

rowwise_sum2(cars, na.rm = TRUE, speed, dist)
# A tibble: 50 x 3
#   speed  dist   sum
#   <dbl> <dbl> <dbl>
# 1     4     2     6
# 2     4    10    14
# 3     7     4    11
# 4     7    22    29
# 5     8    16    24
# 6     9    10    19
# 7    10    18    28
# 8    10    26    36
# 9    10    34    44
#10    11    17    28

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

dplyr mutate apply a custom function

Is there a function like switch which works inside of dplyr::mutate?

mutate/transform in R dplyr (Pass custom function)

dplyr's mutate_each within function works but matches() does not find argument

R - pass vector to custom function to dplyr::mutate

Error when using "diff" function inside of dplyr mutate

dplyr mutate, custom function and variable name as characters

dplyr: sum inside consecutive mutate

Mutate with a list column function in dplyr

Using dplyr mutate_at with custom function

tryCatch inside dplyr's mutate?

dplyr's mutate at each column separately with a custom function of several parametrs

DPLYR Mutate function to "transmute"?

custom function in mutate/tibble

Custom function with dplyr mutate or summarise for different levels within a factor?

Using dplyr quosure custom function with mutate_at

Custom lookup function in R not working within dplyr::mutate in R

Vectorized function for dplyr::mutate()

Apply Multiple Columns to Custom function Using dplyr::mutate(across())

Pass column names to dplyr::coalesce() when writing a custom function

Optimize computation in dplyr mutate function

Using a custom function inside of dplyr mutate?

Unexpected values while applying custom function in dplyr::mutate

R dplyr mutate in a function

dplyr mutate - How to properly apply custom function with mutate?

Using custom mutate function in dplyr

Passing a custom function with conditionals to dplyr::mutate

Understanding mutate function in dplyr

dplyr::mutate when custom function return a vector