我正在为一位同事开展一个项目,以标准化 GC 数据并将其从 mol% 转换为 mass%。
编辑:我正在做逐行归一化。即每次物种的总和norm1
应该是 100(尽管每个都乘以质量,因此不再总和为 100。在 for 循环中,它相当于一个非常繁重的:
for (time in Nmass[,1]){
for species in norm1{
Nmass[time,species] = Fmolwt[species,] = Nmass[time,species] / rowSums(Nmass[time,norm1])
}
}
我导入了 CSV 文件,它们被排列为物种名称列和注射时间行(处理虚拟数据,因此当前全部为零)。
> Nmass[1:5,c("Time",norm1)]
# A tibble: 5 x 13
Time HTFeed_Methane HTFeed_Ethane HTFeed_Ethylene HTFeed_Propane HTFeed_Propylene `HTFeed_iso-butane` `HTFee~ `HTFeed~ `HTFe~ HTFee~ `HTFee~ `HTFee~
<dttm> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2019-10-06 13:02:00 0 0 0 0 0 0 0 0 0 0 0 0
2 2019-10-06 13:17:00 0 0 0 0 0 0 0 0 0 0 0 0
3 2019-10-06 13:32:00 0 0 0 0 0 0 0 0 0 0 0 0
4 2019-10-06 13:47:00 0 0 0 0 0 0 0 0 0 0 0 0
5 2019-10-06 14:02:00 0 0 0 0 0 0 0 0 0 0 0 0
我有一个正常工作的例程:
norm1 = c('HTFeed_Methane','HTFeed_Ethane','HTFeed_Ethylene','HTFeed_Propane','HTFeed_Propylene','HTFeed_iso-butane','HTFeed_n-Butane',
'HTFeed_trans-2-butene','HTFeed_1-Butene','HTFeed_Isobutylene','HTFeed_cis-2-butene','HTFeed_1,3-Butadiene')
Nmass[,norm1] = as.data.frame(apply(Nmass[,norm1], 2, function(x) x/sum(x)))
但是当我尝试使用按物种预先构建的质量列表来实现质量转换时:
Fmolwt = data.frame(c(16.04,30.07,28.05,44.9,42.08,58.12,58.12,56.11,56.11,56.11,56.11,54.1))
colnames(Fmolwt)[1] = 'weight'
rownames(Fmolwt) = c('HTFeed_Methane','HTFeed_Ethane','HTFeed_Ethylene','HTFeed_Propane','HTFeed_Propylene','HTFeed_iso-butane',
'HTFeed_n-Butane','HTFeed_trans-2-butene','HTFeed_1-Butene','HTFeed_Isobutylene','HTFeed_cis-2-butene','HTFeed_1,3-Butadiene')
例程变为(我认为):
Nmass[,norm1] = as.data.frame(apply(Nmass[,norm1], 2, function(x) x*Fmolwt[x,]/sum(x)))
我收到关于尺寸不同的错误。
Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, :
arguments imply differing number of rows: 0, 3696
In addition: Warning messages:
1: In x * Fmolwt[x, ] :
longer object length is not a multiple of shorter object length
2: In x * Fmolwt[x, ] :
longer object length is not a multiple of shorter object length
3: In x * Fmolwt[x, ] :
longer object length is not a multiple of shorter object length
4: In x * Fmolwt[x, ] :
longer object length is not a multiple of shorter object length
5: In x * Fmolwt[x, ] :
longer object length is not a multiple of shorter object length
6: In x * Fmolwt[x, ] :
longer object length is not a multiple of shorter object length
7: In x * Fmolwt[x, ] :
我预计这是由于 apply 语句试图同时引入所有命名的分子量norm1
。
我可以按照我尝试的方式完成这项工作,还是需要写出一个 for 循环?
你这里有一个错误:
Nmass[,norm1] = as.data.frame(apply(Nmass[,norm1], 2, function(x) x*Fmolwt[x,]/sum(x)))
使用 apply(..,2,..),你用 x 调出列条目,从我收集的信息来看,你需要进行逐行操作。其次, Fmolwt[x,] 给出了一个错误,因为您正在调用与 Fmolwt 的行名匹配的值(而不是列名)。
我模拟了一些看起来像下面的数据,以供说明:
set.seed(1234)
norm1 = c('HTFeed_Methane','HTFeed_Ethane','HTFeed_Ethylene',
'HTFeed_Propane','HTFeed_Propylene','HTFeed_iso-butane',
'HTFeed_n-Butane','HTFeed_trans-2-butene',
'HTFeed_1-Butene','HTFeed_Isobutylene','HTFeed_cis-2-butene',
'HTFeed_1,3-Butadiene')
values <- matrix(abs(rnorm(120,1000,100)),ncol=12)
colnames(values) = norm1
ts <- seq(as.POSIXct("2017-01-01", tz = "UTC"),
as.POSIXct("2017-01-02", tz = "UTC"),
length.out = 100)
Nmass = data.frame(Time=ts,values,check.names=F)
Fmolwt = data.frame(c(16.04,30.07,28.05,44.9,42.08,58.12,58.12,
56.11,56.11,56.11,56.11,54.1))
colnames(Fmolwt)[1] = 'weight'
rownames(Fmolwt) = c('HTFeed_Methane','HTFeed_Ethane','HTFeed_Ethylene',
'HTFeed_Propane','HTFeed_Propylene',
'HTFeed_iso-butane','HTFeed_n-Butane','HTFeed_trans-2-butene',
'HTFeed_1-Butene','HTFeed_Isobutylene','HTFeed_cis-2-butene',
'HTFeed_1,3-Butadiene')
模拟数据的样子:
> head(Nmass,2)
Time HTFeed_Methane HTFeed_Ethane HTFeed_Ethylene
1 2017-01-01 00:00:00 879.2934 952.2807 1013.4088
2 2017-01-01 00:14:32 1027.7429 900.1614 950.9314
HTFeed_Propane HTFeed_Propylene HTFeed_iso-butane HTFeed_n-Butane
1 1110.2298 1144.9496 819.3969 1065.659
2 952.4407 893.1357 941.7924 1254.899
HTFeed_trans-2-butene HTFeed_1-Butene HTFeed_Isobutylene HTFeed_cis-2-butene
1 1000.6893 982.2210 994.6841 1041.4524
2 954.4531 983.0006 1025.5196 952.5282
HTFeed_1,3-Butadiene
1 980.4065
2 935.0930
第一步,我们以第一行为例,对其进行归一化(按其总数),然后乘以相应的质量,例如第 1 行,执行:
Fmolwt[norm1,]*Nmass[1,norm1]/sum(Nmass[1,norm1])
为您提供以下结果:
HTFeed_Methane HTFeed_Ethane HTFeed_Ethylene HTFeed_Propane HTFeed_Propylene
1 1.176825 2.389309 2.371873 4.159423 4.020092
HTFeed_iso-butane HTFeed_n-Butane HTFeed_trans-2-butene HTFeed_1-Butene
1 3.973688 5.167942 4.685041 4.598576
HTFeed_Isobutylene HTFeed_cis-2-butene HTFeed_1,3-Butadiene
1 4.656926 4.875886 4.425653
如果你想使用内置的 r 函数,最简单的是 apply,你已经使用过:
results = t(apply(Nmass[,norm1],1,function(x){
Fmolwt[norm1,]*x/sum(x)
}))
所以按照我们之前的情况,x 是来自 Nmass[,norm1] 的一行,所以我们做 x/sum(x) 来归一化,然后乘以 Fmolwt[norm1,]。值匹配是因为我们从 Nmass[,norm1] 开始。现在我们需要转置结果以获得与 Nmass 相同的维度,因此是 t(apply(..))。
如果我们查看第一行,它会给出与上面示例相同的输出:
> results[1,]
HTFeed_Methane HTFeed_Ethane HTFeed_Ethylene
1.176825 2.389309 2.371873
HTFeed_Propane HTFeed_Propylene HTFeed_iso-butane
4.159423 4.020092 3.973688
HTFeed_n-Butane HTFeed_trans-2-butene HTFeed_1-Butene
5.167942 4.685041 4.598576
HTFeed_Isobutylene HTFeed_cis-2-butene HTFeed_1,3-Butadiene
4.656926 4.875886 4.425653
所以如果你想把结果放回去,做
Nmass[,norm] = results
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句