我有一个看起来像这个玩具示例的数据集。数据描述了一个人搬迁到的位置以及自搬迁发生以来的时间。例如,人 1 从农村开始,但在 463 天前搬到了城市(第 2 行),在 415 天前从这个城市搬到了城镇(第 3 行)等。
set.seed(123)
df <- as.data.frame(sample.int(1000, 10))
colnames(df) <- "time"
df$destination <- as.factor(sample(c("city", "town", "rural"), size = 10, replace = TRUE, prob = c(.50, .25, .25)))
df$user <- sample.int(3, 10, replace = TRUE)
df[order(df[,"user"], -df[,"time"]), ]
数据:
time destination user
526 rural 1
463 city 1
415 town 1
299 city 1
179 rural 1
938 town 2
229 town 2
118 city 2
818 city 3
195 city 3
我希望将此数据汇总为以下格式。即,计数类型重定位的每个用户,并总结它到一个矩阵。我如何实现这一点(最好不编写循环)?
from to count
city city 1
city town 1
city rural 1
town city 2
town town 1
town rural 0
rural city 1
rural town 0
rural rural 0
基于data.table
包的一种可能方式:
library(data.table)
cases <- unique(df$destination)
setDT(df)[, .(from=destination, to=shift(destination, -1)), by=user
][CJ(from=cases, to=cases), .(count=.N), by=.EACHI, on=c("from", "to")]
# from to count
# <char> <char> <int>
# 1: city city 1
# 2: city rural 1
# 3: city town 1
# 4: rural city 1
# 5: rural rural 0
# 6: rural town 0
# 7: town city 2
# 8: town rural 0
# 9: town town 1
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句