正确使用dplyr 0：7.0+中的dplyr :: select，使用字符向量选择列

RobinL 发表于 Dev

罗宾·L

假设我们有一个字符向量，cols_to_select其中包含一些我们想从数据帧中选择的列df，例如

df <- tibble::data_frame(a=1:3, b=1:3, c=1:3, d=1:3, e=1:3)
cols_to_select <- c("b", "d")

假设我们还想使用，dplyr::select因为它是使用的操作的一部分，%>%因此使用select使代码易于阅读。

似乎有许多方法可以实现，但是有些方法比其他方法更健壮。请您让我知道哪个是“正确的”版本，为什么？也许还有另一种更好的方法？

dplyr::select(df, cols_to_select) #Fails if 'cols_to_select' happens to be the name of a column in df 
dplyr::select(df, !!cols_to_select) # i.e. using UQ()
dplyr::select(df, !!!cols_to_select) # i.e. using UQS()

cols_to_select_syms <- rlang::syms(c("b", "d"))  #See [here](https://stackoverflow.com/questions/44656993/how-to-pass-a-named-vector-to-dplyrselect-using-quosures/44657171#44657171)
dplyr::select(df, !!!cols_to_select_syms)

ps我意识到这可以通过简单地在base R中实现 df[,cols_to_select]

Zeehio

有一个示例dplyr::select中https://cran.r-project.org/web/packages/rlang/vignettes/tidy-evaluation.html使用：

dplyr::select(df, !!cols_to_select)

为什么？让我们探索您提到的选项：

选项1

dplyr::select(df, cols_to_select)

如您所说，如果cols_to_select恰好是df中的列名，则此操作失败，因此这是错误的。

选项4

cols_to_select_syms <- rlang::syms(c("b", "d"))  
dplyr::select(df, !!!cols_to_select_syms)

这看起来比其他解决方案更令人费解。

选项2和3

dplyr::select(df, !!cols_to_select)
dplyr::select(df, !!!cols_to_select)

在这种情况下，这两种解决方案提供相同的结果。你可以看到的输出!!cols_to_select，并!!!cols_to_select通过这样做：

dput(rlang::`!!`(cols_to_select)) # c("b", "d")
dput(rlang::`!!!`(cols_to_select)) # pairlist("b", "d")

该!!或UQ()操作员立即评估其说法的背景下，这就是你想要的。

的!!!或UQS()操作者被用于一次一个函数来传递多个参数。

对于示例中的字符列名称，将其作为单个长度为2的向量（使用!!）或作为两个长度为1的向量（使用）的列表都没有关系!!!。对于更复杂的用例，您将需要使用多个参数作为列表：（使用!!!）

a <- quos(contains("c"), dplyr::starts_with("b"))
dplyr::select(df, !!a) # does not work
dplyr::select(df, !!!a) # does work

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。

编辑于 2020-11-6

我来说两句

0 条评论

登录后参与评论

上一篇：如何将上游分支设置为与分支相同的名称