通过 R 中的因子变量对数据框进行子集化时遇到问题

奥克斯泰夫

我试图通过因子变量 $area__rucc 将我的数据框（火车）子集分为两组：地铁和非地铁。这个数据框是干净的，有 34 个变量和 2,811 个观察值。
```
 glimpse(train$area__rucc)
```
因子 w/ 9 个级别“都会区 - 100 万或以上人口的都会区县”，..：3 3 1 6 7 8 6 2 7 5 ...

前三层表示地铁，后六层表示非地铁

- 首先，我尝试通过地铁进行子集...

metro <- subset(train, area__rucc == c("Metro - Counties in metro areas of 1 million population or more", "Metro - Counties in metro areas of 250,000 to 1 million population", "Metro - Counties in metro areas of fewer than 250,000 population"))

这似乎按预期工作并返回了 df 与 387 个观察值。

- 接下来，我尝试按这样的非地铁级别进行子集...

not_metro <- subset(train, area__rucc != c("Metro - Counties in metro areas of 1 million population or more", "Metro - Counties in metro areas of 250,000 to 1 million population", "Metro - Counties in metro areas of fewer than 250,000 population"))

这返回了 2811 个观测值，但经过进一步检查，df 包含地铁级别和非地铁级别。显然没有按我的意图工作。

- 我的第三枪...

non_metro <- subset(train, area__rucc == c("Nonmetro - Completely rural or less than 2,500 urban population, adjacent to a metro area", 
                "Nonmetro - Completely rural or less than 2,500 urban population, not adjacent to a metro area", 
                "Nonmetro - Urban population of 2,500 to 19,999, adjacent to a metro area", 
                "Nonmetro - Urban population of 2,500 to 19,999, not adjacent to a metro area", 
                "Nonmetro - Urban population of 20,000 or more, adjacent to a metro area", 
                "Nonmetro - Urban population of 20,000 or more, not adjacent to a metro area"))

在这里，我明确列出了非地铁级别 (4:9)。这返回了一个包含 354 个观测值的 df，所有观测值都是非地铁的。

387（地铁）+ 354（非地铁）！= 3189 train$area_rucc 中没有缺失值，所以我试图从 train 创建的两个 df 应该与原始 df 保持相同数量的观察，对吗？

我有一种感觉，我犯了一个愚蠢的错误，我现在似乎无法抓住（可能缺乏经验），或者我可能完全偏离了我在这里尝试做的事情，但这开始令人沮丧我，任何见解将不胜感激。

格雷戈尔·托马斯

==做逐元素（按行）比较-你想%in%，而不是

在我们进入你的代码之前，让我们做一个简单的例子

x = 1:6
y = c(1, 3)
x == y
# [1]  TRUE FALSE FALSE FALSE FALSE FALSE

请注意如何只有一个TRUE，即使 1 和 3 都在1:6. 那是因为比较是这样发生的：

data.frame(x, y, "x==y" = x == y, check.names = FALSE)
#   x y  x==y
# 1 1 1  TRUE   # 1 does equal 1
# 2 2 3 FALSE   # 2 does not equal 3
# 3 3 1 FALSE   # 3 does not equal 1
# 4 4 3 FALSE   # 4 does not equal 3
# 5 5 1 FALSE   # 5 does not equal 1
# 6 6 3 FALSE   # 6 does not equal 3

x == y检查的所述第一元件x相对于第一的y，第二x对的第二y，等等。如果一个x或y较短，它将被“回收”一样可以在上述在输入数据帧看y = c(1, 3)成为1 3 1 3 1 3在数据框架。

相反，使用%in%：

x %in% y
# [1]  TRUE FALSE  TRUE FALSE FALSE FALSE

x %in% yx对照的所有元素检查的每个元素y。现在我们得到两个 TRUE 值，因为 1 和 3 都在 c(1, 3)

应用于您的问题：

metro <- subset(train,
    area__rucc %in% c(
        "Metro - Counties in metro areas of 1 million population or more",
        "Metro - Counties in metro areas of 250,000 to 1 million population",
        "Metro - Counties in metro areas of fewer than 250,000 population"
    )
)

你可以否定它，! x %in% y，所以

not_metro <- subset(train,
        !area__rucc %in% c(
            "Metro - Counties in metro areas of 1 million population or more",
            "Metro - Counties in metro areas of 250,000 to 1 million population",
            "Metro - Counties in metro areas of fewer than 250,000 population"
        )
    )

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。

编辑于 2021-07-2

我来说两句

0 条评论

登录后参与评论

上一篇：Tensorflow FailedPreconditionError：尝试使用未初始化的值 beta1_power

通过 R 中的因子变量对数据框进行子集化时遇到问题

通过 R 中的因子变量对数据框进行子集化时遇到问题

Android Studio Kotlin：提取为常量

IE 11中的FormData未定义

计算数据帧R中的字符串频率

如何在R中转置数据

如何使用Redux-Toolkit重置Redux Store

Excel 2016图表将增长与4个参数进行比较

在 Python 2.7 中。如何从文件中读取特定文本并分配给变量

未捕获的SyntaxError：带有Ajax帖子的意外令牌u

OpenCv：改变 putText() 的位置

ActiveModelSerializer仅显示关联的ID

算术中的c ++常量类型转换

如何开始为Ubuntu开发

将加号/减号添加到jQuery菜单

去噪自动编码器和常规自动编码器有什么区别？

获取并汇总所有关联的数据

OpenGL纹理格式的颜色错误

在 React Native Expo 中使用 react-redux 更改另一个键的值

http：// localhost：3000 /＃！/为什么我在localhost链接中得到“＃！/”。

TreeMap中的自定义排序

Redux动作正常，但减速器无效

如何对treeView的子节点进行排序