当我们有几个组时，基于控件的样本归一化

阿达兰1

假设我们有以下 DataFrame：

data = {'Compounds': ['Drug_A', 'Drug_A', 'Drug_A', 'Drug_A', 'Drug_A', 'Drug_A', 'Drug_B', 'Drug_B',
                   'Drug_B','Drug_B','Drug_B','Drug_B','Drug_B','Drug_B','Drug_B','Drug_B','Drug_B','Drug_B',
                   'Drug_C', 'Drug_C','Drug_C','Drug_C','Drug_C','Drug_C','Drug_C','Drug_C','Drug_C','Drug_C', np.nan, 
                   np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan,
                   np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan], 
        'values': [24, 20, 48, 17, 20, 8, 22, 16, 46, 44, 12, 38, 26, 16, 19, 23, 9, 39, 19, 24, 43, 6, 24, 46, 26, 15, 8, 
                  22, 22, 32, 23, 41, 8, 46, 29, 34, 34, 39, 32, 22, 28, 34, 29, 19, 44, 22, 17, 41, 19, 39, 27, 46, 37, 26],
      'identifier': ['Sample', 'Sample','Sample','Sample','Sample','Sample','Sample','Sample','Sample','Sample',
                    'Sample','Sample','Sample','Sample','Sample','Sample','Sample','Sample','Sample','Sample',
                    'Sample','Sample','Sample','Sample','Sample','Sample','Sample','Sample', 'Control', 'Control',
                    'Control','Control','Control','Control','Control','Control','Control','Control','Control',
                    'Control','Control','Control','Control','Control','Control','Control','Control','Control',
                    'Control','Control','Control','Control','Control','Control',], 
'Experiment': ['P1', 'P1', 'P2', 
                     'P2', 'P3', 'P3', 'P1', 'P1', 'P1', 'P2', 'P2', 'P2', 'P3', 'P3', 'P1', 'P1', 'P1', 'P2', 'P2', 
                    'P2', 'P2', 'P2', 'P3', 'P3', 'P1','P1', 'P1', 'P1', 'P1', 'P1', 'P1', 'P1', 
                    'P2', 'P2','P2','P2','P2','P2','P2','P2','P2','P2','P2','P2','P3','P3','P3','P3','P3','P3', 'P1', 'P2',
                                                                                           'P3','P1' ]}
df = pd.DataFrame(data)

在标识符列中，我们有 Sample 和 Control 值。我们首先要：计算来自不同实验（即 P1、P2、P3）的所有对照的“值”列的平均值：

df_control = df.loc[df['identifier'] == 'Control']
z = df_control['values'].mean()

如果我想在一行中编写它，上面脚本的紧凑形式是什么？我可以使用列表全面吗？

接下来，出于标准化的目的，我们希望分别将 z 除以每个实验 P1、P2、P3 中对照的平均“值”，以获得每个这些实验的 normalization_factor。

最后，将每个特定实验的归一化因子乘以属于该实验的样本值。

最简单、最直接的方法是什么？感谢您的帮助！

存入

这是你要找的吗？

df.groupby(by=['identifier']).mean()
Out: 
               values
identifier           
Control     30.384615
Sample      24.285714

进而：

df.groupby(by=['identifier', 'Experiment']).mean()
Out: 
                          values
identifier Experiment           
Control    P1          28.500000
           P2          30.769231
           P3          31.285714
Sample     P1          20.833333
           P2          29.000000
           P3          23.333333

第二个具有以下MultiIndex可用于访问数据的内容：

MultiIndex([('Control', 'P1'),
            ('Control', 'P2'),
            ('Control', 'P3'),
            ( 'Sample', 'P1'),
            ( 'Sample', 'P2'),
            ( 'Sample', 'P3')],
           names=['identifier', 'Experiment'])

你现在可以以此为基础：

all_mean = df.groupby(by=['identifier']).mean()
spec_mean = df.groupby(by=['identifier', 'Experiment']).mean()
result = all_mean/spec_mean

Out
                         values
identifier Experiment          
Control    P1          1.066127
           P2          0.987500
           P3          0.971198
Sample     P1          1.165714
           P2          0.837438
           P3          1.040816

现在将数据放入某种平面结构中（？OP对此没有明确说明）：

normalization_factors = {idx[1]: result.loc[idx].values[0] for idx in result.index if idx[0] == 'Control'}
# {'P1': 1.0661268556005397, 'P2': 0.9874999999999999, 'P3': 0.9711977520196698}
sample_values = {idx[1]: result.loc[idx].values[0] * normalization_factors[idx[1]] for idx in result.index if idx[0] == 'Sample'}
# {'P1': 1.2427993059572005, 'P2': 0.8269704433497537, 'P3': 1.0108384765919014}

映射sample_data到df作为：

df["calculated_col_with_the_name_you_prefer"] = df["Experiment"].map(sample_values)

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。

编辑于 2021-10-1

我来说两句

0 条评论

登录后参与评论

上一篇：CollectionGroup 查询返回空结果

当我们有几个组时，基于控件的样本归一化

当我们有几个组时，基于控件的样本归一化

UITableView的项目向下滚动后更改颜色，然后快速备份

Linux的官方Adobe Flash存储库是否已过时？

用日期数据透视表和日期顺序查询

应用发明者仅从列表中选择一个随机项一次

Mac OS X更新后的GRUB 2问题

验证REST API参数

Java Eclipse中的错误13，如何解决？

带有错误“ where”条件的查询如何返回结果？

ggplot：对齐多个分面图-所有大小不同的分面

尝试反复更改屏幕上按钮的位置 - kotlin android studio

如何从视图一次更新多行（ASP.NET - Core）

计算数据帧中每行的NA

蓝屏死机没有修复解决方案

在 Python 2.7 中。如何从文件中读取特定文本并分配给变量

离子动态工具栏背景色

VB.net将2条特定行导出到DataGridView

通过 Git 在运行 Jenkins 作业时获取 ClassNotFoundException

在Windows 7中无法删除文件（2）

python中的boto3文件上传

当我尝试下载 StanfordNLP en 模型时，出现错误

Node.js中未捕获的异常错误，发生调用