使用randomForest（）和插入符号的randomForest获得不同的结果（方法=“ rf”）

ej5607 发表于 Dev

ej5607

我对插入符号是陌生的，我只想确保自己完全了解它在做什么。为此，我一直在尝试使用插入符的train（）函数对method =“ rf”复制从randomForest（）模型获得的结果。不幸的是，我无法获得匹配的结果，我想知道我忽略了什么。

我还要补充一点，鉴于randomForest使用自举来生成适合每个ntree的样本，并根据袋外预测来估计错误，因此我对指定“ oob”和“ boot”之间的区别有点模糊在trainControl函数调用中。这些选项生成不同的结果，但都不匹配randomForest（）模型。

尽管我已经阅读了插入符号包网站（http://topepo.github.io/caret/index.html），以及各种可能潜在相关的StackOverflow问题，但我仍无法弄清楚为什么caret method =“ rf”模型从randomForest（）产生不同的结果。非常感谢您可能提供的任何见解。

这是一个可复制的示例，使用了MASS软件包中的CO2数据集。

library(MASS)
data(CO2)

library(randomForest)
set.seed(1)
rf.model <- randomForest(uptake ~ ., 
                       data = CO2,
                       ntree = 50,
                       nodesize = 5,
                       mtry=2,
                       importance=TRUE, 
                       metric="RMSE")

library(caret)
set.seed(1)
caret.oob.model <- train(uptake ~ ., 
                     data = CO2,
                     method="rf",
                     ntree=50,
                     tuneGrid=data.frame(mtry=2),
                     nodesize = 5,
                     importance=TRUE, 
                     metric="RMSE",
                     trControl = trainControl(method="oob"),
                     allowParallel=FALSE)

set.seed(1)
caret.boot.model <- train(uptake ~ ., 
                     data = CO2,
                     method="rf",
                     ntree=50,
                     tuneGrid=data.frame(mtry=2),
                     nodesize = 5,
                     importance=TRUE, 
                     metric="RMSE",
                     trControl=trainControl(method="boot", number=50),
                     allowParallel=FALSE)

 print(rf.model)
 print(caret.oob.model$finalModel) 
 print(caret.boot.model$finalModel)

产生以下内容：

打印（rf.model）

      Mean of squared residuals: 9.380421
                % Var explained: 91.88

打印（caret.oob.model $ finalModel）

      Mean of squared residuals: 38.3598
                % Var explained: 66.81

打印（caret.boot.model $ finalModel）

      Mean of squared residuals: 42.56646
                % Var explained: 63.16

和代码来考虑变量的重要性：

importance(rf.model)

importance(caret.oob.model$finalModel)

importance(caret.boot.model$finalModel)

露丝·拉蒙（LluísRamon）

在训练中使用公式界面会将因子转换为虚拟变量。要与比较结果caret，randomForest请使用非公式界面。

在您的情况下，您应该在内部提供种子trainControl以获取与中相同的结果randomForest。

在插入符号网页的部分培训中，有一些关于可重复性的说明，其中解释了如何使用种子。

library("randomForest")
set.seed(1)
rf.model <- randomForest(uptake ~ ., 
                         data = CO2,
                         ntree = 50,
                         nodesize = 5,
                         mtry = 2,
                         importance = TRUE, 
                         metric = "RMSE")

library("caret")
caret.oob.model <- train(CO2[, -5], CO2$uptake, 
                         method = "rf",
                         ntree = 50,
                         tuneGrid = data.frame(mtry = 2),
                         nodesize = 5,
                         importance = TRUE, 
                         metric = "RMSE",
                         trControl = trainControl(method = "oob", seed = 1),
                         allowParallel = FALSE)

如果要进行重采样，则应为每个重采样迭代提供种子，并为最终模型提供一个种子。中的示例?trainControl显示了如何创建它们。

在以下示例中，最后一个种子用于最终模型，我将其设置为1。

seeds <- as.vector(c(1:26), mode = "list")

# For the final model
seeds[[26]] <- 1

caret.boot.model <- train(CO2[, -5], CO2$uptake, 
                          method = "rf",
                          ntree = 50,
                          tuneGrid = data.frame(mtry = 2),
                          nodesize = 5,
                          importance = TRUE, 
                          metric = "RMSE",
                          trControl = trainControl(method = "boot", seeds = seeds),
                          allowParallel = FALSE)

正确定义具有caret和种子的非公式接口，trainControl您将在所有三个模型中获得相同的结果：

rf.model
caret.oob.model$final
caret.boot.model$final

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。

编辑于 2020-11-2

我来说两句

0 条评论

登录后参与评论

上一篇：live555如何通过rtsp协议发送的h264比特流计算或读取帧速率

TOP 榜单

文章

使用randomForest（）和插入符号的randomForest获得不同的结果（方法=“ rf”）

使用randomForest（）和插入符号的randomForest获得不同的结果（方法=“ rf”）

蓝屏死机没有修复解决方案

计算数据帧中每行的NA

UITableView的项目向下滚动后更改颜色，然后快速备份

Node.js中未捕获的异常错误，发生调用

在 Python 2.7 中。如何从文件中读取特定文本并分配给变量

Linux的官方Adobe Flash存储库是否已过时？

验证REST API参数

ggplot：对齐多个分面图-所有大小不同的分面

Mac OS X更新后的GRUB 2问题

通过 Git 在运行 Jenkins 作业时获取 ClassNotFoundException

带有错误“ where”条件的查询如何返回结果？

用日期数据透视表和日期顺序查询

VB.net将2条特定行导出到DataGridView

如何从视图一次更新多行（ASP.NET - Core）

Java Eclipse中的错误13，如何解决？

尝试反复更改屏幕上按钮的位置 - kotlin android studio

离子动态工具栏背景色

应用发明者仅从列表中选择一个随机项一次

当我尝试下载 StanfordNLP en 模型时，出现错误

python中的boto3文件上传

在同一Pushwoosh应用程序上Pushwoosh多个捆绑ID