Passing in unbound variables into R functions

jia chen
> counties %>% select(state, county, population, poverty)
> # also written as
> select(counties , state, county, population, poverty)
> state
Error: object 'state' not found

Hi everyone,

I have a question regarding what exactly is being passed into the select function here, state, county, population and poverty are not actually variables bound to the enclosing environment but rather column names of the first element. Which makes the arguments being passed in to the function actually stateful.

Generally, in other languages these keys would be passed in as a string, so I'm just wondering how we ought to reason and think about these unbound variables here! And maybe additionally, how the R interpreter / parser handles this under the hood.

MKR

This is a case of the R-specific non-standard evaluation. This is a quite powerful concept in R and basically means that what you pass in the function select is not evaluated directly. Rather, it takes the argument unevaluated and evaluates it later, in the context of the dataframe.

I suggest you read the Advanced R chapter on this http://adv-r.had.co.nz/Computing-on-the-language.html.

To further expand on this, look at this example which showcases the basic concept:


expample_dataframe <- data.frame(
  foo = c(1:5),
  bar = c(10:14)
)

foo <- c("any" , "variable", "in", "global", "namespace")

print(foo) 
#> [1] "any"       "variable"  "in"        "global"    "namespace"

select_column <- function(data_frame, column_name){
  column_name <- substitute(column_name)
  eval(column_name, envir = data_frame)
}

select_column(expample_dataframe, foo)
#> [1] 1 2 3 4 5

foo
#> [1] "any"       "variable"  "in"        "global"    "namespace"

Explanation

substitute() leaves the input of the function unevaluated, i.e. it references it as a symbol. Then we can use the eval() function which allows you to evaluate a certain call/symbol inside a certain namespace. And yes, in R dataframes are namespaces. Thus

eval(column_name, envir = data_frame) evaluates column_name in the context of the dataframe.

This is what goes on behind the scenes of many functions in R.

Created on 2020-08-14 by the reprex package (v0.3.0)

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related