优化R中for循环的性能

Daniel 发表于 Dev

丹尼尔

我有一个字符向量，并想为每对向量值（使用stringdist包）创建一个具有距离度量的矩阵。当前，我有一个带有嵌套for循环的实现：

library(stringdist)

strings <- c("Hello", "Helo", "Hole", "Apple", "Ape", "New", "Old", "System", "Systemic")
m <- matrix(nrow = length(strings), ncol = length(strings))
colnames(m) <- strings
rownames(m) <- strings

for (i in 1:nrow(m)) {
  for (j in 1:ncol(m)) {
    m[i,j] <- stringdist::stringdist(tolower(rownames(m)[i]), tolower(colnames(m)[j]), method = "lv")
  }
}

结果为以下矩阵：

> m
         Hello Helo Hole Apple Ape New Old System Systemic
Hello        0    1    3     4   5   4   4      6        7
Helo         1    0    2     4   4   3   3      6        7
Hole         3    2    0     3   3   4   2      5        7
Apple        4    4    3     0   2   5   4      5        7
Ape          5    4    3     2   0   3   3      5        7
New          4    3    4     5   3   0   3      5        7
Old          4    3    2     4   3   3   0      6        8
System       6    6    5     5   5   5   6      0        2
Systemic     7    7    7     7   7   7   8      2        0

但是，例如，如果我有一个长度为1000的矢量，其中包含许多非唯一值，则此矩阵会很大（比如说800行乘800列），并且循环非常慢。我喜欢优化性能，例如通过使用apply函数，但是我不知道如何将上面的代码转换成apply语法。有人可以帮忙吗？

丹尼尔

感谢@hrbrmstr的提示，我发现该stringdist程序包本身提供了一个名为的函数stringdistmatrix，该函数可以执行我想要的操作（请参阅此处）。

函数调用很简单： stringdistmatrix(strings, strings)

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。