我将使用一个假设的场景来说明这个问题。这是一张桌子,上面有音乐家和他们演奏的乐器,还有一张表,其中有乐队的作品:
musicians <- data.table(
instrument = rep(c('bass','drums','guitar'), each = 4),
musician = c('Chas','John','Paul','Stuart','Andy','Paul','Peter','Ringo','George','John','Paul','Ringo')
)
band.comp <- data.table(
instrument = c('bass','drums','guitar'),
n = c(2,1,2)
)
为避免争论谁最好用哪种乐器,乐队将按分类组装。这是我的做法:
musicians[band.comp, on = 'instrument'][, sample(musician, n), by = instrument]
instrument V1
1: bass Paul
2: bass Chas
3: drums Andy
4: guitar Paul
5: guitar George
问题在于:由于有些音乐家演奏的乐器不止一种,因此可能会吸引一个人不止一次。
可以建立一个for循环,为每个随后的乐器子集吸引音乐人,然后从表的其余部分中消除音乐人。但是我想建议如何使用data.table做到这一点。主要是因为我需要用这种逻辑在现实生活中解决的这类问题涉及具有成千上万行的数据库。同时也因为我试图更好地理解data.table语法。
作为参考,我尝试了来自Andrew Brooks博客的一些技巧,但无法提出解决方案。
出现在相关文章中:基于唯一值和列值从数据框中随机绘制行,而eddi的答案非常适合此操作:
#keep number of musicians per instrument in 1 data.table
musicians[band.comp, n:=n, on=.(instrument)]
#for storing the musician that has been sampled so far
m <- c()
musicians[, {
#exclude sampled musician before sampling
res <- .SD[!musician %chin% m][sample(.N, n[1L])]
m <- c(m, res$musician)
res
}, by=.(instrument)]
样本输出:
instrument musician n
1: bass Stuart 2
2: bass Chas 2
3: drums Paul 1
4: guitar John 2
5: guitar Ringo 2
或更简洁地进行错误处理:
m <- c()
musicians[
band.comp,
on=.(instrument),
j={
s <- setdiff(musician, m)
if (length(s) < n) stop(paste("Not enough musicians playing", .BY))
res <- sample(s, n)
m <- c(m, res)
res
},
by=.EACHI]
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句