应用规范化

Ben A 发表于 Dev

本A

我正在为一位同事开展一个项目，以标准化 GC 数据并将其从 mol% 转换为 mass%。

编辑：我正在做逐行归一化。即每次物种的总和norm1应该是 100（尽管每个都乘以质量，因此不再总和为 100。在 for 循环中，它相当于一个非常繁重的：

for (time in Nmass[,1]){
   for species in norm1{
      Nmass[time,species] = Fmolwt[species,] = Nmass[time,species] / rowSums(Nmass[time,norm1])
                       }
                       }

我导入了 CSV 文件，它们被排列为物种名称列和注射时间行（处理虚拟数据，因此当前全部为零）。

> Nmass[1:5,c("Time",norm1)]
# A tibble: 5 x 13
  Time                HTFeed_Methane HTFeed_Ethane HTFeed_Ethylene HTFeed_Propane HTFeed_Propylene `HTFeed_iso-butane` `HTFee~ `HTFeed~ `HTFe~ HTFee~ `HTFee~ `HTFee~
  <dttm>                       <dbl>         <dbl>           <dbl>          <dbl>            <dbl>               <dbl>   <dbl>    <dbl>  <dbl>  <dbl>   <dbl>   <dbl>
1 2019-10-06 13:02:00              0             0               0              0                0                   0       0        0      0      0       0       0
2 2019-10-06 13:17:00              0             0               0              0                0                   0       0        0      0      0       0       0
3 2019-10-06 13:32:00              0             0               0              0                0                   0       0        0      0      0       0       0
4 2019-10-06 13:47:00              0             0               0              0                0                   0       0        0      0      0       0       0
5 2019-10-06 14:02:00              0             0               0              0                0                   0       0        0      0      0       0       0

我有一个正常工作的例程：

norm1 = c('HTFeed_Methane','HTFeed_Ethane','HTFeed_Ethylene','HTFeed_Propane','HTFeed_Propylene','HTFeed_iso-butane','HTFeed_n-Butane',
        'HTFeed_trans-2-butene','HTFeed_1-Butene','HTFeed_Isobutylene','HTFeed_cis-2-butene','HTFeed_1,3-Butadiene')

Nmass[,norm1] = as.data.frame(apply(Nmass[,norm1], 2, function(x) x/sum(x)))

但是当我尝试使用按物种预先构建的质量列表来实现质量转换时：

Fmolwt = data.frame(c(16.04,30.07,28.05,44.9,42.08,58.12,58.12,56.11,56.11,56.11,56.11,54.1))
colnames(Fmolwt)[1] = 'weight'
rownames(Fmolwt) = c('HTFeed_Methane','HTFeed_Ethane','HTFeed_Ethylene','HTFeed_Propane','HTFeed_Propylene','HTFeed_iso-butane',
                    'HTFeed_n-Butane','HTFeed_trans-2-butene','HTFeed_1-Butene','HTFeed_Isobutylene','HTFeed_cis-2-butene','HTFeed_1,3-Butadiene')

例程变为（我认为）：

Nmass[,norm1] = as.data.frame(apply(Nmass[,norm1], 2, function(x) x*Fmolwt[x,]/sum(x)))

我收到关于尺寸不同的错误。

Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE,  : 
  arguments imply differing number of rows: 0, 3696
In addition: Warning messages:
1: In x * Fmolwt[x, ] :
  longer object length is not a multiple of shorter object length
2: In x * Fmolwt[x, ] :
  longer object length is not a multiple of shorter object length
3: In x * Fmolwt[x, ] :
  longer object length is not a multiple of shorter object length
4: In x * Fmolwt[x, ] :
  longer object length is not a multiple of shorter object length
5: In x * Fmolwt[x, ] :
  longer object length is not a multiple of shorter object length
6: In x * Fmolwt[x, ] :
  longer object length is not a multiple of shorter object length
7: In x * Fmolwt[x, ] :

我预计这是由于 apply 语句试图同时引入所有命名的分子量norm1。

我可以按照我尝试的方式完成这项工作，还是需要写出一个 for 循环？

笨狼

你这里有一个错误：

Nmass[,norm1] = as.data.frame(apply(Nmass[,norm1], 2, function(x) x*Fmolwt[x,]/sum(x)))

使用 apply(..,2,..)，你用 x 调出列条目，从我收集的信息来看，你需要进行逐行操作。其次， Fmolwt[x,] 给出了一个错误，因为您正在调用与 Fmolwt 的行名匹配的值（而不是列名）。

我模拟了一些看起来像下面的数据，以供说明：

set.seed(1234)

norm1 = c('HTFeed_Methane','HTFeed_Ethane','HTFeed_Ethylene',
'HTFeed_Propane','HTFeed_Propylene','HTFeed_iso-butane',
'HTFeed_n-Butane','HTFeed_trans-2-butene',
'HTFeed_1-Butene','HTFeed_Isobutylene','HTFeed_cis-2-butene',
'HTFeed_1,3-Butadiene')

values <- matrix(abs(rnorm(120,1000,100)),ncol=12)
colnames(values) = norm1

ts <- seq(as.POSIXct("2017-01-01", tz = "UTC"),
    as.POSIXct("2017-01-02", tz = "UTC"),
    length.out = 100)

Nmass = data.frame(Time=ts,values,check.names=F)

Fmolwt = data.frame(c(16.04,30.07,28.05,44.9,42.08,58.12,58.12,
56.11,56.11,56.11,56.11,54.1))
colnames(Fmolwt)[1] = 'weight'
rownames(Fmolwt) = c('HTFeed_Methane','HTFeed_Ethane','HTFeed_Ethylene',
'HTFeed_Propane','HTFeed_Propylene',
'HTFeed_iso-butane','HTFeed_n-Butane','HTFeed_trans-2-butene',
'HTFeed_1-Butene','HTFeed_Isobutylene','HTFeed_cis-2-butene',
'HTFeed_1,3-Butadiene')

模拟数据的样子：

> head(Nmass,2)
                 Time HTFeed_Methane HTFeed_Ethane HTFeed_Ethylene
1 2017-01-01 00:00:00       879.2934      952.2807       1013.4088
2 2017-01-01 00:14:32      1027.7429      900.1614        950.9314
  HTFeed_Propane HTFeed_Propylene HTFeed_iso-butane HTFeed_n-Butane
1      1110.2298        1144.9496          819.3969        1065.659
2       952.4407         893.1357          941.7924        1254.899
  HTFeed_trans-2-butene HTFeed_1-Butene HTFeed_Isobutylene HTFeed_cis-2-butene
1             1000.6893        982.2210           994.6841           1041.4524
2              954.4531        983.0006          1025.5196            952.5282
  HTFeed_1,3-Butadiene
1             980.4065
2             935.0930

第一步，我们以第一行为例，对其进行归一化（按其总数），然后乘以相应的质量，例如第 1 行，执行：

Fmolwt[norm1,]*Nmass[1,norm1]/sum(Nmass[1,norm1])

为您提供以下结果：

  HTFeed_Methane HTFeed_Ethane HTFeed_Ethylene HTFeed_Propane HTFeed_Propylene
1       1.176825      2.389309        2.371873       4.159423         4.020092
  HTFeed_iso-butane HTFeed_n-Butane HTFeed_trans-2-butene HTFeed_1-Butene
1          3.973688        5.167942              4.685041        4.598576
  HTFeed_Isobutylene HTFeed_cis-2-butene HTFeed_1,3-Butadiene
1           4.656926            4.875886             4.425653

如果你想使用内置的 r 函数，最简单的是 apply，你已经使用过：

results = t(apply(Nmass[,norm1],1,function(x){
      Fmolwt[norm1,]*x/sum(x)
    }))

所以按照我们之前的情况，x 是来自 Nmass[,norm1] 的一行，所以我们做 x/sum(x) 来归一化，然后乘以 Fmolwt[norm1,]。值匹配是因为我们从 Nmass[,norm1] 开始。现在我们需要转置结果以获得与 Nmass 相同的维度，因此是 t(apply(..))。

如果我们查看第一行，它会给出与上面示例相同的输出：

> results[1,]
       HTFeed_Methane         HTFeed_Ethane       HTFeed_Ethylene 
             1.176825              2.389309              2.371873 
       HTFeed_Propane      HTFeed_Propylene     HTFeed_iso-butane 
             4.159423              4.020092              3.973688 
      HTFeed_n-Butane HTFeed_trans-2-butene       HTFeed_1-Butene 
             5.167942              4.685041              4.598576 
   HTFeed_Isobutylene   HTFeed_cis-2-butene  HTFeed_1,3-Butadiene 
             4.656926              4.875886              4.425653

所以如果你想把结果放回去，做

Nmass[,norm] = results

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。

编辑于 2021-07-29

我来说两句

0 条评论

登录后参与评论

TOP 榜单

文章

应用规范化

应用规范化

Qt Creator Windows 10 - “使用 jom 而不是 nmake”不起作用

使用next.js时出现服务器错误，错误：找不到react-redux上下文值；请确保组件包装在<Provider>中

Swift 2.1-对单个单元格使用UITableView

SQL Server中的非确定性数据类型

如何避免每次重新编译所有文件？

Hashchange事件侦听器在将事件处理程序附加到事件之前进行侦听

在同一Pushwoosh应用程序上Pushwoosh多个捆绑ID

HttpClient中的角度变化检测

在 Avalonia 中是否有带有柱子的 TreeView 或类似的东西？

在Wagtail管理员中，如何禁用图像和文档的摘要项？

通过iwd从Linux系统上的命令行连接到wifi（适用于Linux的无线守护程序）

构建类似于Jarvis的本地语言应用程序

Camunda-根据分配的组过滤任务列表

如何了解DFT结果

Embers js中的更改侦听器上的组合框

ggplot：对齐多个分面图-所有大小不同的分面

使用分隔符将成对相邻的数组元素相互连接

PHP Curl PUT 在 curl_exec 处停止

您如何通过 Nativescript 中的 Fetch 发出发布请求？

错误：找不到存根。请确保已调用spring-cloud-contract：convert

应用发明者仅从列表中选择一个随机项一次