在迭代组合单个数据框的列与列表中的其他数据框时,我遇到了一个(可能很小)的问题。一些数据说明:
# load example data
library(vegan)
data(varechem)
data(varespec)
# generate predictor tables with overlapping rows and different amount of cols
varespec1 <- varespec[c(1:9), ]
varespec2 <- varespec[c(8:16), c(1:43)]
varespec3 <- varespec[c(14:24), c(1:41)]
# store predictor tables in list
subset_list <- list(varespec1 = varespec1,
varespec2 = varespec2,
varespec3 = varespec3)
# generate a table that holds ALL possible response variables as presence/absence
varechem_binary <- as.data.frame(apply(varechem, 2, cut,
breaks = c(-Inf, 1.0, Inf), labels = c("Absent", "Present")))
row.names(varechem_binary) <- row.names(varechem)
上面的代码说明了如何为分类任务准备数据。现在的想法是,应使用列表中data.frames
包含预测变量(varespec1
,...)的方法来预测响应表(varechem_binary
)中的每一列,但一次只能预测一个。将响应表与每个预测器表合并起来很容易:
# merge response table with each predictor table
merge_counter <- 0
merged_list <- list()
for(table in subset_list) {
merge_counter <- merge_counter + 1
current_name <- names(subset_list)[merge_counter]
tmp <- merge(table, varechem_binary, by = "row.names")
row.names(tmp) <- tmp$Row.names
tmp <- tmp[, -1]
merged_list[[current_name]] <- tmp
rm(tmp)
}
预期产量:
我现在正在寻找(或者在代码的前面,如果更有意义的话)是一种将每个预测变量表与varechem
列表中响应表中的每一列以及确切一列结合在一起的方法。这基本上是:
# storing in data frames just for illustration, I would like to do this within the list
# subsets for the 3 predictor tables with the first response variable
aa <- merged_list[[1]][,-c(46:58)] # column 1:44 are the predictor variables, then the different response variables start
bb <- merged_list[[2]][,-c(45:57)] # column 1:43 are the predictor variables, then the different response variables start
cc <- merged_list[[3]][,-c(43:58)] # column 1:41 are the predictor variables, then the different response variables start
# subsets for the 3 predictor tables with the second response variable
dd <- merged_list[[1]][,-c(45, 47:58)]
ee <- merged_list[[2]][,-c(44, 46:57)]
ff <- merged_list[[3]][,-c(42, 44:58)]
# subsets for the 3 predictor tables with the third response variable
gg <- merged_list[[1]][,-c(45, 46, 48:58)]
...
# this is just to illustrate how the list could look like, I would like to keep all files in a list all the time
list_for_classification_runs <- list(aa, bb, cc, dd, ee, ff, gg, ...)
该结果列表将是“随机森林”分类调用的输入,其中响应变量将由来自以下所有其他预测变量进行分类varespec
:
for (current_table in list_for_classification_runs) {
counter <- counter + 1
# response_variable should be the one variable added to the predictor variables in the data frames
RF_list[[counter]] <- ranger(response_variable ~ ., data = current_table)
}
根据Gregor的评论,我想出了类似的方法。我没有将完整的合并到的varechem_binary
所有元素中subset_list
,而是添加了另一个for循环并遍历中的所有列varechem_binary
。使用drop = FALSE
row.names和结构被保留,因此合并有效:
merge_col_counter <- 0
column_counter <- 0
merged_column_list <- list()
for(table in subset_list) {
merge_col_counter <- merge_col_counter + 1
for (column in names(varechem_binary)) {
column_counter <- column_counter + 1
current_name <- paste(names(subset_list)[merge_col_counter], names(varechem_binary)[column_counter], sep = "_")
print(current_name)
tmp <- merge(table, varechem_binary[, column_counter, drop = FALSE], by = "row.names")
row.names(tmp) <- tmp$Row.names
tmp <- tmp[, -1]
merged_column_list[[current_name]] <- tmp
rm(tmp)
}
column_counter <- 0
}
可能有多种方法可以使这种方法更清洁或更有效,但是它可以工作,所以我可以继续
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句