如何在R中使用嵌套函数有效地执行复杂的行操作？

Martin 发表于 Dev

马丁

给定一个带有column的多维数组，例如zooobject 。进一步提供一个函数，例如，对每个列分别进行加权，但也取决于column中特定行的值。在这里如何有效地进行行操作，例如计算rowWeightedMeans？za,b,c,xW(w=c(1,1,1), x)x

众所周知R::zoo，如果函数非常简单，则对行操作非常快速且高效，例如：

W <- function(w) { return(w); }
z[,"wmean"] <- rowWeightedMeans(z[,1:3], w=W(c(0.1,0.5,0.3)))

但是，如果W()取决于该行中的值怎么办？例如：

W <- function(w, x) { return(w*x); }
z[,"wmean"] <- rowWeightedMeans(z[,1:3], w=W(c(0.1,0.5,0.3), z[,4]))

R在这里抱怨是因为它不知道如何处理嵌套函数中参数的多维。

解决方案可以是for(i in 1:nrow(z))循环，然后为每一行分别计算值i。但是，对于大型数据集，这需要大量的额外计算工作和时间。

编辑

好的，谢谢您的宝贵时间和批评者。我尝试并测试了所有答案，但必须承认实际问题并未解决或理解。例如，我没有要求重写我的权重函数或计算，因为我已经介绍了更复杂的计算的最小版本。这里的问题或疑问要深得多。因此，我坐了下来，试图将问题归结为邪恶的根源，并为您找到了一个最小的工作示例，而没有任何zoosweightedMeans等等。干得好：

z <- data.frame(matrix (1:20, nrow = 4))
colnames (z) <- c ("a", "b", "c", "x", "y")
z
#   a b  c  x  y
#1 1 5  9 13 17
#2 2 6 10 14 18
#3 3 7 11 15 19
#4 4 8 12 16 20

W <- function(abc, w, p) { 
  ifelse (w[1] == p, return(length(p)), return(0))
  # Please do not complain! I know this is stupid, but it is an MWE
  # and my calculations contained in W() are much more complex!
}

z[,"y"] <- W(z[,1:3], c(14,7,8), z[,"x"])
# same result: z[,"y"] <- apply(z[,1:3], 1, W, c(14,7,8), z[,"x"])
z
#  a b  c  x y
#1 1 5  9 13 4
#2 2 6 10 14 4
#3 3 7 11 15 4
#4 4 8 12 16 4

# expected outcome:
#  a b  c  x y
#1 1 5  9 13 0
#2 2 6 10 14 4
#3 3 7 11 15 0
#4 4 8 12 16 0

我面临的问题是，R将所有行都传递z[,"x"]给该函数，但是，我希望它仅采用与z[,"y"]R遍历该函数时当前正在内部处理的行相对应的行。在此示例中，我希望14==14仅在第2行中！那么：如何告诉R逐行传递给函数？

解决方案

除了授予和接受的答案之外，我还想在此处总结解决方案，以提高清晰度并提供有关讨论的更好概述。

这个问题与重写特定功能（例如权重）无关W。这仅与R无法将多个逐行参数传递给通用函数有关。通过使用z$y <- f(z$a, z$x)或z$y <- apply(z$a, 1, f, z$x)，这两种方法都只将第一个参数作为逐行传递，将第二个参数作为包含所有行的完整列传递。似乎这是R的固有行为，我们需要解决。

为了解决这个问题，需要将整个行作为单个参数传递给包装函数，然后包装函数将特定的计算应用于该行。权重问题的解决方案：

f <- function(x) weighted.mean(x[1:3], W(c(0.1,0.5,0.3), x[4]))
z[,"wmean"] <- apply(z[,1:4], 1, f)

数据框中心问题的解决方案：

f <- function(x) W(x[1:3], c(14,7,8), x[4])
z$y <- apply(z, 1, f)

Brian在接受的答案中还提供了使用编译后的C代码的更快方法。感谢@ BrianAlbertMonroe，@ jaimedash和@inscaven处理了不清楚的问题并暗示了该解决方案。

布莱恩·阿尔伯特·梦露

尚未真正使用过，zoo或者rowWeightedMeans如果您只是在对行元素求平均值之前简单地将权重应用于行元素，并要求权重取决于行元素之一：

z <- matrix(rnorm(100),ncol=4)

W <- function(row, weights){
    weights <- weights * row[4]
    row2 <- row[1:3] * weights
    sum(row2) / sum(weights)

}

w.means <- apply(z, 1, W, weights = c(0.1, 0.5, 0.3))

如果以上给出的答案正确，但是您担心速度W太快，请在Rcpp中编写该函数或使用内置的cmpfun，

N <- 10000

z <- matrix(rnorm(N),ncol=4)

# Interpreted R function
W1 <- function(row, weights){
    weights <- weights * row[4]
    row2 <- row[1:3] * weights
    mean(row2)
}

# Compiled R function
W2 <- compiler::cmpfun(W1)

# C++ function imported into R via Rcpp
Rcpp::cppFunction('double Wcpp(NumericVector row, NumericVector weights){

                                int x = row.size() ;

                                NumericVector wrow(x - 1);
                                NumericVector nweights(x - 1);

                                nweights = weights * row[x - 1];

                                for( int i = 0; i < (x-1) ; i++){
                                    wrow[i] = row[i] * nweights[i];
                                }

                                double res = sum(wrow) / sum(nweights);

                                return(res);

}')

w.means0 <- apply(z,1,W,weights=c(0.1,0.5,0.3))
w.means1 <- apply(z,1,W2,weights=c(0.1,0.5,0.3))
w.means2 <- apply(z,1,Wcpp,weights=c(0.1,0.5,0.3))

identical( w.means0, w.means1, w.means2 )

#[1] TRUE

或者

# Write the whole thing in C++
Rcpp::cppFunction('NumericVector WM(NumericMatrix z , NumericVector weights){
                                int x = z.ncol() ;
                                int y = z.nrow() ;

                                NumericVector res(y);
                                NumericVector wrow(x - 1);

                                NumericVector nweights(x - 1);
                                double nwsum;
                                double mult;

                                for( int row = 0 ; row < y ; row++){

                                    mult = z(row,x-1);

                                    nweights = weights * mult;
                                    nwsum = sum(nweights);

                                    for( int i = 0; i < (x-1) ; i++){

                                        wrow[i] = z(row,i) * nweights[i] ;
                                    }

                                  res[row] = sum(wrow) / nwsum;

                                }

                                return(res);

}')

microbenchmark::microbenchmark(
    w.means0 <- apply(z,1,W1,weights=c(0.1,0.5,0.3)),
    w.means1 <- apply(z,1,W2,weights=c(0.1,0.5,0.3)),
    w.means2 <- apply(z,1,Wcpp,weights=c(0.1,0.5,0.3)),
    w.means3 <- WM(z = z, weights = c(0.1, 0.5, 0.3))
)

    Unit: microseconds
                                                      expr       min         lq       mean     median         uq       max neval
   w.means0 <- apply(z, 1, W1, weights = c(0.1, 0.5, 0.3)) 12114.834 12536.9330 12995.1722 12838.2805 13163.4835 15796.403   100
   w.means1 <- apply(z, 1, W2, weights = c(0.1, 0.5, 0.3))  9941.571 10286.8085 10769.7330 10410.9465 10788.6800 19526.840   100
 w.means2 <- apply(z, 1, Wcpp, weights = c(0.1, 0.5, 0.3)) 10919.112 11631.5530 12849.7294 13262.9705 13707.7465 17438.524   100
         w.means3 <- WM(z = z, weights = c(0.1, 0.5, 0.3))    94.172   107.9855   146.2606   125.0075   140.2695  2089.933   100

编辑：

合并该weighted.means函数会极大地减慢计算速度，并且不会根据帮助文件专门处理缺失值，因此您仍然需要编写代码来管理它们。

> z <- matrix(rnorm(100),ncol=4)

> W <- function(row, weights){
+     weights <- weights * row[4]
+     row2 <- row[1:3] * weights
+     sum(row2) / sum(weights)
+ 
+ }

> W1 <- compiler::cmpfun(W)

> W2 <- function(row, weights){
+     weights <- weights * row[4]
+     weighted.mean(row[1:3],weights)
+ }

> W3 <- compiler::cmpfun(W2)

> w.means1 <- apply(z, 1, W, weights = c(0.1, 0.5, 0.3))

> w.means2 <- apply(z, 1, W2, weights = c(0.1, 0.5, 0.3))

> identical(w.means1,w.means2)
[1] TRUE

> microbenchmark(
+   w.means1 <- apply(z, 1, W, weights = c(0.1, 0.5, 0.3)),
+   w.means1 <- apply(z, 1, W1, weights = c(0.1, 0.5, 0.3)),
+   w.means2 < .... [TRUNCATED] 
Unit: microseconds
                                                    expr     min       lq     mean   median       uq     max neval
  w.means1 <- apply(z, 1, W, weights = c(0.1, 0.5, 0.3)) 145.315 167.4550 172.8163 172.9120 180.6920 194.673   100
 w.means1 <- apply(z, 1, W1, weights = c(0.1, 0.5, 0.3)) 124.087 134.3365 143.6803 137.8925 148.7145 225.459   100
 w.means2 <- apply(z, 1, W2, weights = c(0.1, 0.5, 0.3)) 307.311 346.6320 356.4845 354.7325 371.7620 412.110   100
 w.means2 <- apply(z, 1, W3, weights = c(0.1, 0.5, 0.3)) 280.073 308.7110 323.0156 324.1230 333.7305 407.963   100

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。

编辑于 2021-04-11

我来说两句

0 条评论

登录后参与评论

上一篇：带有子菜单CSS HTML的下拉菜单

TOP 榜单

文章

如何在R中使用嵌套函数有效地执行复杂的行操作？

如何在R中使用嵌套函数有效地执行复杂的行操作？

Android Studio Kotlin：提取为常量

IE 11中的FormData未定义

计算数据帧R中的字符串频率

如何在R中转置数据

如何使用Redux-Toolkit重置Redux Store

Excel 2016图表将增长与4个参数进行比较

在 Python 2.7 中。如何从文件中读取特定文本并分配给变量

未捕获的SyntaxError：带有Ajax帖子的意外令牌u

OpenCv：改变 putText() 的位置

ActiveModelSerializer仅显示关联的ID

算术中的c ++常量类型转换

如何开始为Ubuntu开发

将加号/减号添加到jQuery菜单

去噪自动编码器和常规自动编码器有什么区别？

获取并汇总所有关联的数据

OpenGL纹理格式的颜色错误

在 React Native Expo 中使用 react-redux 更改另一个键的值

http：// localhost：3000 /＃！/为什么我在localhost链接中得到“＃！/”。

TreeMap中的自定义排序

Redux动作正常，但减速器无效

如何对treeView的子节点进行排序