迭代地将多个列之一合并到列表中的数据框

疯狂的圣诞老人

在迭代组合单个数据框的列与列表中的其他数据框时，我遇到了一个（可能很小）的问题。一些数据说明：

# load example data
library(vegan)
data(varechem)
data(varespec)

# generate predictor tables with overlapping rows and different amount of cols
varespec1 <- varespec[c(1:9), ]
varespec2 <- varespec[c(8:16), c(1:43)]
varespec3 <- varespec[c(14:24), c(1:41)]

# store predictor tables in list
subset_list <- list(varespec1 = varespec1, 
  varespec2 = varespec2, 
  varespec3 = varespec3)

# generate a table that holds ALL possible response variables as presence/absence
varechem_binary <- as.data.frame(apply(varechem, 2, cut, 
  breaks = c(-Inf, 1.0, Inf), labels = c("Absent", "Present")))
row.names(varechem_binary) <- row.names(varechem)

上面的代码说明了如何为分类任务准备数据。现在的想法是，应使用列表中data.frames包含预测变量（varespec1，...）的方法来预测响应表（varechem_binary）中的每一列，但一次只能预测一个。将响应表与每个预测器表合并起来很容易：

# merge response table with each predictor table
merge_counter <- 0
merged_list <- list()
for(table in subset_list) {
    merge_counter <- merge_counter + 1
    current_name <- names(subset_list)[merge_counter]
    tmp <- merge(table, varechem_binary, by = "row.names")
    row.names(tmp) <- tmp$Row.names
    tmp <- tmp[, -1]
    merged_list[[current_name]] <- tmp
    rm(tmp)
}

预期产量：

我现在正在寻找（或者在代码的前面，如果更有意义的话）是一种将每个预测变量表与varechem列表中响应表中的每一列以及确切一列结合在一起的方法。这基本上是：

# storing in data frames just for illustration, I would like to do this within the list
# subsets for the 3 predictor tables with the first response variable
aa <- merged_list[[1]][,-c(46:58)]  # column 1:44 are the predictor variables, then the different response variables start
bb <- merged_list[[2]][,-c(45:57)]  # column 1:43 are the predictor variables, then the different response variables start
cc <- merged_list[[3]][,-c(43:58)] # column 1:41 are the predictor variables, then the different response variables start

# subsets for the 3 predictor tables with the second response variable
dd <- merged_list[[1]][,-c(45, 47:58)]
ee <- merged_list[[2]][,-c(44, 46:57)]
ff <- merged_list[[3]][,-c(42, 44:58)]

# subsets for the 3 predictor tables with the third response variable
gg <- merged_list[[1]][,-c(45, 46, 48:58)]
...

# this is just to illustrate how the list could look like, I would like to keep all files in a list all the time
list_for_classification_runs <- list(aa, bb, cc, dd, ee, ff, gg, ...)

该结果列表将是“随机森林”分类调用的输入，其中响应变量将由来自以下所有其他预测变量进行分类varespec：

for (current_table in list_for_classification_runs) {
  counter <- counter + 1 
  # response_variable should be the one variable added to the predictor variables in the data frames 
  RF_list[[counter]] <- ranger(response_variable ~ ., data = current_table)
}

疯狂的圣诞老人

根据Gregor的评论，我想出了类似的方法。我没有将完整的合并到的varechem_binary所有元素中subset_list，而是添加了另一个for循环并遍历中的所有列varechem_binary。使用drop = FALSErow.names和结构被保留，因此合并有效：

merge_col_counter <- 0
column_counter <- 0
merged_column_list <- list()

for(table in subset_list) {
    merge_col_counter <- merge_col_counter + 1
    for (column in names(varechem_binary)) {
      column_counter <- column_counter + 1
      current_name <- paste(names(subset_list)[merge_col_counter], names(varechem_binary)[column_counter], sep = "_")
      print(current_name)
      tmp <- merge(table, varechem_binary[, column_counter, drop = FALSE], by = "row.names")
      row.names(tmp) <- tmp$Row.names
      tmp <- tmp[, -1]
      merged_column_list[[current_name]] <- tmp
      rm(tmp)
    }
    column_counter <- 0
}

可能有多种方法可以使这种方法更清洁或更有效，但是它可以工作，所以我可以继续

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。

编辑于 2021-01-22

我来说两句

0 条评论

登录后参与评论

上一篇：如何仅使用Javascript设置特定div块的打开和关闭？

TOP 榜单

文章

迭代地将多个列之一合并到列表中的数据框

迭代地将多个列之一合并到列表中的数据框

蓝屏死机没有修复解决方案

计算数据帧中每行的NA

UITableView的项目向下滚动后更改颜色，然后快速备份

Node.js中未捕获的异常错误，发生调用

在 Python 2.7 中。如何从文件中读取特定文本并分配给变量

Linux的官方Adobe Flash存储库是否已过时？

验证REST API参数

ggplot：对齐多个分面图-所有大小不同的分面

Mac OS X更新后的GRUB 2问题

通过 Git 在运行 Jenkins 作业时获取 ClassNotFoundException

带有错误“ where”条件的查询如何返回结果？

用日期数据透视表和日期顺序查询

VB.net将2条特定行导出到DataGridView

如何从视图一次更新多行（ASP.NET - Core）

Java Eclipse中的错误13，如何解决？

尝试反复更改屏幕上按钮的位置 - kotlin android studio

离子动态工具栏背景色

应用发明者仅从列表中选择一个随机项一次

当我尝试下载 StanfordNLP en 模型时，出现错误

python中的boto3文件上传

在同一Pushwoosh应用程序上Pushwoosh多个捆绑ID