我正在使用 R 編程語言。我有以下代碼可以創建 100 個數據集(包含一個固定組件和一個隨機組件):
a = rnorm(300,10,5)
b = rnorm(300,3,1)
c = rnorm(300,12,1)
e = "original"
d = data.frame(a,b,c,e)
results <- list()
for (i in 1:100){
a = rnorm(100,10,10)
b = rnorm(100,10,10)
c = rnorm(100,10,10)
e = "simulated"
d_i = data.frame(a,b,c,e)
data_i = rbind(d, d_i)
data_i$iteration = i
results[[i]] <- data_i
}
results_df <- do.call(rbind.data.frame, results)
目前,這 100 個數據集已全部放在同一個文件(“results_df”)中。現在,我想將“results_df”文件分解為這 100 個數據集中的每一個(使用“迭代”列作為索引):
results_df$iteration = as.factor(results_df$iteration)
X<-split(results_df, results_df$iteration)
這個“X”文件似乎是一個“列表”,其中列出了 100 個數據集中的每一個,如下所示:
我可以通過使用調用“索引”來訪問這些文件中的每一個i
,例如
> head(X$`1`)
a b c e iteration
1 2.141495 3.984072 12.73344 original 1
2 8.769269 4.267464 11.32566 original 1
3 5.413573 2.823608 12.06713 original 1
4 11.710470 3.710500 12.20715 original 1
5 14.423155 2.944178 10.56673 original 1
6 6.886629 2.843195 12.43074 original 1
> head(X$`2`)
a b c e iteration
401 2.141495 3.984072 12.73344 original 2
402 8.769269 4.267464 11.32566 original 2
403 5.413573 2.823608 12.06713 original 2
404 11.710470 3.710500 12.20715 original 2
405 14.423155 2.944178 10.56673 original 2
406 6.886629 2.843195 12.43074 original 2
> head(X$`98`)
a b c e iteration
38801 2.141495 3.984072 12.73344 original 98
38802 8.769269 4.267464 11.32566 original 98
38803 5.413573 2.823608 12.06713 original 98
38804 11.710470 3.710500 12.20715 original 98
38805 14.423155 2.944178 10.56673 original 98
38806 6.886629 2.843195 12.43074 original 98
我的問題:我現在想編寫另一個函數,對這 100 個數據集的每一個執行線性回歸,保存回歸係數,並將它們放入單個文件中。我試圖為此編寫代碼:
results_1 <- list()
for (i in 1:100){
model_i <- lm(a ~ b +c, data = X$`i`)
coeff_i = model_i$coefficients
results_1[[i]] <- coeff_i
}
results_df_1 <- do.call(rbind.data.frame, results_1)
乍一看,這似乎奏效了 - 但這將所有回歸係數顯示為相同。這是不可能的,因為回歸模型在不同的數據集上運行了 100 次:
#for some reason, the column names have been corrupted
hist(results_df_1$c.14.5741211250235..14.5741211250235..14.5741211250235..14.5741211250235.., main = "first coeff")
hist(results_df_1$c..0.105285805666629...0.105285805666629...0.105285805666629.., main = "second coeff")
hist(results_df_1$c..0.236548691738492...0.236548691738492...0.236548691738492.., main = "third coeff")
有人可以幫我解決這個問題嗎?當您在 R 中使用“split()”函數時,這是在以後的命令中“調用”“拆分組件”的正確方法嗎?
model_i <- lm(a ~ b +c, data = X$`i`)
謝謝!
我能夠解決這個問題:
a = rnorm(300,10,5)
b = rnorm(300,3,1)
c = rnorm(300,12,1)
e = "original"
d = data.frame(a,b,c,e)
results <- list()
for (i in 1:100){
a = rnorm(100,10,10)
b = rnorm(100,10,10)
c = rnorm(100,10,10)
e = "simulated"
d_i = data.frame(a,b,c,e)
data_i = rbind(d, d_i)
data_i$iteration = i
results[[i]] <- data_i
}
results_df <- do.call(rbind.data.frame, results)
X<-split(results_df, results_df$iteration)
#####
results_1 <- list()
for (i in 1:100){
#here was the problem
model_i <- lm(a ~ b +c, data = X[[i]])
coeff_i = model_i$coefficients
results_1[[i]] <- model_i$coefficients
}
results_df_1 <- do.call(rbind.data.frame, results_1)
par(mfrow = c(1, 3))
hist(results_df_1[,1], main = "first coeff")
hist(results_df_1[,2], main = "second coeff")
hist(results_df_1[,3], main = "third coeff")
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句