当我遇到这种奇怪的情况时,我想通过将两个级别组合为一个级别来修改因子变量中的级别。基本上,我的新关卡已创建,但其余所有关卡似乎都移到了下一个关卡。这是我的示例数据,使用的代码和输出。
library(tidyverse)
data <- structure(list(factor1 = structure(c(1L, 1L, 2L, 3L, 1L, 2L,
1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 3L, 1L, 1L, 1L, 4L), .Label = c("0", "1", "2", "3",
"4", "5", "6", "7"), class = "factor")), row.names = c(NA, -30L
), class = c("tbl_df", "tbl", "data.frame"), .Names = "factor1")
data_out <- data %>% mutate(factor1 = ifelse(factor1 %in% c('0', '1'),
factor1, '>1'))
structure(list(factor1 = c("1", "1", "2", ">1", "1", "2", "1",
"1", "2", "2", "2", "2", "2", "1", "2", "1", "1", "1", "1", "1",
"1", "1", "1", "1", "1", ">1", "1", "1", "1", ">1")), .Names = "factor1",
class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -30L))
这是可取的行为吗?在我看来,这当然不是。如何解释然后纠正呢?
我猜想这个问题与因素的构建方式有关。我mutate
还不清楚因素如何从{“ 0”,“ 1”}级别变为{“ 1”,“ 2”,“> 1”}级别。
R因子实际上是基数为1的整数向量,具有作为其级别的属性。因此,您的“ 0”级别最初实际上是整数1,而您的“ 1”级别则是整数2。显然,该mutate
函数适合创建带有附加级别的新因子,该因子打印为“> 1”,但也将“ 0”级别重新分配为新的“ 1”级别,并将“ 1”级别重新分配为“ 2”-水平。mutate
对我来说,这似乎是一种危险的行为。我认为它应该给了您一个新的因子,其级别为“ 0”,“ 1”,“> 1”,否则应该引发错误。
ifelse
尽管来自mutate
新列的问题也使问题复杂化,但错误来自。如果强制data
使用数据框,则会看到:
data$factor2 <- ifelse( data$factor1 %in% c('0', '1'),
data$factor1, '>1')
data
#-------- same issue except
factor1 factor2
1 0 1
2 0 1
3 1 2
4 2 >1
.... delete the other 26 rows
> str(data)
'data.frame': 30 obs. of 2 variables:
$ factor1: Factor w/ 8 levels "0","1","2","3",..: 1 1 2 3 1 2 1 1 2 2 ...
$ factor2: chr "1" "1" "2" ">1" ...
这会让您留在dplyr
包中:
recode_factor(data$factor1, `0` = "0", `1` = "1", .default=">1")
[1] 0 0 1 >1 0 1 0 0 1 1 1 1 1 0 1 0 0 0 0 0 0 0 0 0 0 >1 0 0 0 >1
Levels: 0 1 >1
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句