在 R 中聚合顺序和分组数据

Joshua 发表于 Dev

约书亚

我有一个看起来像这个玩具示例的数据集。数据描述了一个人搬迁到的位置以及自搬迁发生以来的时间。例如，人 1 从农村开始，但在 463 天前搬到了城市（第 2 行），在 415 天前从这个城市搬到了城镇（第 3 行）等。

set.seed(123)
df <- as.data.frame(sample.int(1000, 10))
colnames(df) <- "time"
df$destination <- as.factor(sample(c("city", "town", "rural"), size = 10, replace = TRUE, prob = c(.50, .25, .25)))
df$user <- sample.int(3, 10, replace = TRUE)
df[order(df[,"user"], -df[,"time"]), ]

数据：

time destination user
 526       rural    1
 463        city    1
 415        town    1
 299        city    1
 179       rural    1
 938        town    2
 229        town    2
 118        city    2
 818        city    3
 195        city    3

我希望将此数据汇总为以下格式。即，计数类型重定位的每个用户，并总结它到一个矩阵。我如何实现这一点（最好不编写循环）？

from  to     count
city  city   1
city  town   1
city  rural  1
town  city   2
town  town   1
town  rural  0
rural city   1
rural town   0
rural rural  0

B. 克里斯蒂安甘冈

基于data.table包的一种可能方式：

library(data.table)

cases <- unique(df$destination)

setDT(df)[, .(from=destination, to=shift(destination, -1)), by=user
          ][CJ(from=cases, to=cases), .(count=.N), by=.EACHI, on=c("from", "to")]


#      from     to count
#    <char> <char> <int>
# 1:   city   city     1
# 2:   city  rural     1
# 3:   city   town     1
# 4:  rural   city     1
# 5:  rural  rural     0
# 6:  rural   town     0
# 7:   town   city     2
# 8:   town  rural     0
# 9:   town   town     1

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。