给定一个带有column的多维数组,例如zoo
object 。进一步提供一个函数,例如,对每个列分别进行加权,但也取决于column中特定行的值。在这里如何有效地进行行操作,例如计算rowWeightedMeans?z
a,b,c,x
W(w=c(1,1,1), x)
x
众所周知R::zoo
,如果函数非常简单,则对行操作非常快速且高效,例如:
W <- function(w) { return(w); }
z[,"wmean"] <- rowWeightedMeans(z[,1:3], w=W(c(0.1,0.5,0.3)))
但是,如果W()
取决于该行中的值怎么办?例如:
W <- function(w, x) { return(w*x); }
z[,"wmean"] <- rowWeightedMeans(z[,1:3], w=W(c(0.1,0.5,0.3), z[,4]))
R在这里抱怨是因为它不知道如何处理嵌套函数中参数的多维。
解决方案可以是for(i in 1:nrow(z))
循环,然后为每一行分别计算值i
。但是,对于大型数据集,这需要大量的额外计算工作和时间。
编辑
好的,谢谢您的宝贵时间和批评者。我尝试并测试了所有答案,但必须承认实际问题并未解决或理解。例如,我没有要求重写我的权重函数或计算,因为我已经介绍了更复杂的计算的最小版本。这里的问题或疑问要深得多。因此,我坐了下来,试图将问题归结为邪恶的根源,并为您找到了一个最小的工作示例,而没有任何zoo
sweightedMeans
等等。干得好:
z <- data.frame(matrix (1:20, nrow = 4))
colnames (z) <- c ("a", "b", "c", "x", "y")
z
# a b c x y
#1 1 5 9 13 17
#2 2 6 10 14 18
#3 3 7 11 15 19
#4 4 8 12 16 20
W <- function(abc, w, p) {
ifelse (w[1] == p, return(length(p)), return(0))
# Please do not complain! I know this is stupid, but it is an MWE
# and my calculations contained in W() are much more complex!
}
z[,"y"] <- W(z[,1:3], c(14,7,8), z[,"x"])
# same result: z[,"y"] <- apply(z[,1:3], 1, W, c(14,7,8), z[,"x"])
z
# a b c x y
#1 1 5 9 13 4
#2 2 6 10 14 4
#3 3 7 11 15 4
#4 4 8 12 16 4
# expected outcome:
# a b c x y
#1 1 5 9 13 0
#2 2 6 10 14 4
#3 3 7 11 15 0
#4 4 8 12 16 0
我面临的问题是,R将所有行都传递z[,"x"]
给该函数,但是,我希望它仅采用与z[,"y"]
R遍历该函数时当前正在内部处理的行相对应的行。在此示例中,我希望14==14
仅在第2行中!那么:如何告诉R逐行传递给函数?
解决方案
除了授予和接受的答案之外,我还想在此处总结解决方案,以提高清晰度并提供有关讨论的更好概述。
这个问题与重写特定功能(例如权重)无关W
。这仅与R无法将多个逐行参数传递给通用函数有关。通过使用z$y <- f(z$a, z$x)
或z$y <- apply(z$a, 1, f, z$x)
,这两种方法都只将第一个参数作为逐行传递,将第二个参数作为包含所有行的完整列传递。似乎这是R的固有行为,我们需要解决。
为了解决这个问题,需要将整个行作为单个参数传递给包装函数,然后包装函数将特定的计算应用于该行。权重问题的解决方案:
f <- function(x) weighted.mean(x[1:3], W(c(0.1,0.5,0.3), x[4]))
z[,"wmean"] <- apply(z[,1:4], 1, f)
数据框中心问题的解决方案:
f <- function(x) W(x[1:3], c(14,7,8), x[4])
z$y <- apply(z, 1, f)
Brian在接受的答案中还提供了使用编译后的C代码的更快方法。感谢@ BrianAlbertMonroe,@ jaimedash和@inscaven处理了不清楚的问题并暗示了该解决方案。
尚未真正使用过,zoo
或者rowWeightedMeans
如果您只是在对行元素求平均值之前简单地将权重应用于行元素,并要求权重取决于行元素之一:
z <- matrix(rnorm(100),ncol=4)
W <- function(row, weights){
weights <- weights * row[4]
row2 <- row[1:3] * weights
sum(row2) / sum(weights)
}
w.means <- apply(z, 1, W, weights = c(0.1, 0.5, 0.3))
如果以上给出的答案正确,但是您担心速度W
太快,请在Rcpp中编写该函数或使用内置的cmpfun
,
N <- 10000
z <- matrix(rnorm(N),ncol=4)
# Interpreted R function
W1 <- function(row, weights){
weights <- weights * row[4]
row2 <- row[1:3] * weights
mean(row2)
}
# Compiled R function
W2 <- compiler::cmpfun(W1)
# C++ function imported into R via Rcpp
Rcpp::cppFunction('double Wcpp(NumericVector row, NumericVector weights){
int x = row.size() ;
NumericVector wrow(x - 1);
NumericVector nweights(x - 1);
nweights = weights * row[x - 1];
for( int i = 0; i < (x-1) ; i++){
wrow[i] = row[i] * nweights[i];
}
double res = sum(wrow) / sum(nweights);
return(res);
}')
w.means0 <- apply(z,1,W,weights=c(0.1,0.5,0.3))
w.means1 <- apply(z,1,W2,weights=c(0.1,0.5,0.3))
w.means2 <- apply(z,1,Wcpp,weights=c(0.1,0.5,0.3))
identical( w.means0, w.means1, w.means2 )
#[1] TRUE
或者
# Write the whole thing in C++
Rcpp::cppFunction('NumericVector WM(NumericMatrix z , NumericVector weights){
int x = z.ncol() ;
int y = z.nrow() ;
NumericVector res(y);
NumericVector wrow(x - 1);
NumericVector nweights(x - 1);
double nwsum;
double mult;
for( int row = 0 ; row < y ; row++){
mult = z(row,x-1);
nweights = weights * mult;
nwsum = sum(nweights);
for( int i = 0; i < (x-1) ; i++){
wrow[i] = z(row,i) * nweights[i] ;
}
res[row] = sum(wrow) / nwsum;
}
return(res);
}')
microbenchmark::microbenchmark(
w.means0 <- apply(z,1,W1,weights=c(0.1,0.5,0.3)),
w.means1 <- apply(z,1,W2,weights=c(0.1,0.5,0.3)),
w.means2 <- apply(z,1,Wcpp,weights=c(0.1,0.5,0.3)),
w.means3 <- WM(z = z, weights = c(0.1, 0.5, 0.3))
)
Unit: microseconds
expr min lq mean median uq max neval
w.means0 <- apply(z, 1, W1, weights = c(0.1, 0.5, 0.3)) 12114.834 12536.9330 12995.1722 12838.2805 13163.4835 15796.403 100
w.means1 <- apply(z, 1, W2, weights = c(0.1, 0.5, 0.3)) 9941.571 10286.8085 10769.7330 10410.9465 10788.6800 19526.840 100
w.means2 <- apply(z, 1, Wcpp, weights = c(0.1, 0.5, 0.3)) 10919.112 11631.5530 12849.7294 13262.9705 13707.7465 17438.524 100
w.means3 <- WM(z = z, weights = c(0.1, 0.5, 0.3)) 94.172 107.9855 146.2606 125.0075 140.2695 2089.933 100
编辑:
合并该weighted.means
函数会极大地减慢计算速度,并且不会根据帮助文件专门处理缺失值,因此您仍然需要编写代码来管理它们。
> z <- matrix(rnorm(100),ncol=4)
> W <- function(row, weights){
+ weights <- weights * row[4]
+ row2 <- row[1:3] * weights
+ sum(row2) / sum(weights)
+
+ }
> W1 <- compiler::cmpfun(W)
> W2 <- function(row, weights){
+ weights <- weights * row[4]
+ weighted.mean(row[1:3],weights)
+ }
> W3 <- compiler::cmpfun(W2)
> w.means1 <- apply(z, 1, W, weights = c(0.1, 0.5, 0.3))
> w.means2 <- apply(z, 1, W2, weights = c(0.1, 0.5, 0.3))
> identical(w.means1,w.means2)
[1] TRUE
> microbenchmark(
+ w.means1 <- apply(z, 1, W, weights = c(0.1, 0.5, 0.3)),
+ w.means1 <- apply(z, 1, W1, weights = c(0.1, 0.5, 0.3)),
+ w.means2 < .... [TRUNCATED]
Unit: microseconds
expr min lq mean median uq max neval
w.means1 <- apply(z, 1, W, weights = c(0.1, 0.5, 0.3)) 145.315 167.4550 172.8163 172.9120 180.6920 194.673 100
w.means1 <- apply(z, 1, W1, weights = c(0.1, 0.5, 0.3)) 124.087 134.3365 143.6803 137.8925 148.7145 225.459 100
w.means2 <- apply(z, 1, W2, weights = c(0.1, 0.5, 0.3)) 307.311 346.6320 356.4845 354.7325 371.7620 412.110 100
w.means2 <- apply(z, 1, W3, weights = c(0.1, 0.5, 0.3)) 280.073 308.7110 323.0156 324.1230 333.7305 407.963 100
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句