假设我们有以下 DataFrame:
data = {'Compounds': ['Drug_A', 'Drug_A', 'Drug_A', 'Drug_A', 'Drug_A', 'Drug_A', 'Drug_B', 'Drug_B',
'Drug_B','Drug_B','Drug_B','Drug_B','Drug_B','Drug_B','Drug_B','Drug_B','Drug_B','Drug_B',
'Drug_C', 'Drug_C','Drug_C','Drug_C','Drug_C','Drug_C','Drug_C','Drug_C','Drug_C','Drug_C', np.nan,
np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan,
np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
'values': [24, 20, 48, 17, 20, 8, 22, 16, 46, 44, 12, 38, 26, 16, 19, 23, 9, 39, 19, 24, 43, 6, 24, 46, 26, 15, 8,
22, 22, 32, 23, 41, 8, 46, 29, 34, 34, 39, 32, 22, 28, 34, 29, 19, 44, 22, 17, 41, 19, 39, 27, 46, 37, 26],
'identifier': ['Sample', 'Sample','Sample','Sample','Sample','Sample','Sample','Sample','Sample','Sample',
'Sample','Sample','Sample','Sample','Sample','Sample','Sample','Sample','Sample','Sample',
'Sample','Sample','Sample','Sample','Sample','Sample','Sample','Sample', 'Control', 'Control',
'Control','Control','Control','Control','Control','Control','Control','Control','Control',
'Control','Control','Control','Control','Control','Control','Control','Control','Control',
'Control','Control','Control','Control','Control','Control',],
'Experiment': ['P1', 'P1', 'P2',
'P2', 'P3', 'P3', 'P1', 'P1', 'P1', 'P2', 'P2', 'P2', 'P3', 'P3', 'P1', 'P1', 'P1', 'P2', 'P2',
'P2', 'P2', 'P2', 'P3', 'P3', 'P1','P1', 'P1', 'P1', 'P1', 'P1', 'P1', 'P1',
'P2', 'P2','P2','P2','P2','P2','P2','P2','P2','P2','P2','P2','P3','P3','P3','P3','P3','P3', 'P1', 'P2',
'P3','P1' ]}
df = pd.DataFrame(data)
在标识符列中,我们有 Sample 和 Control 值。我们首先要:计算来自不同实验(即 P1、P2、P3)的所有对照的“值”列的平均值:
df_control = df.loc[df['identifier'] == 'Control']
z = df_control['values'].mean()
如果我想在一行中编写它,上面脚本的紧凑形式是什么?我可以使用列表全面吗?
接下来,出于标准化的目的,我们希望分别将 z 除以每个实验 P1、P2、P3 中对照的平均“值”,以获得每个这些实验的 normalization_factor。
最后,将每个特定实验的归一化因子乘以属于该实验的样本值。
最简单、最直接的方法是什么?感谢您的帮助!
这是你要找的吗?
df.groupby(by=['identifier']).mean()
Out:
values
identifier
Control 30.384615
Sample 24.285714
进而:
df.groupby(by=['identifier', 'Experiment']).mean()
Out:
values
identifier Experiment
Control P1 28.500000
P2 30.769231
P3 31.285714
Sample P1 20.833333
P2 29.000000
P3 23.333333
第二个具有以下MultiIndex
可用于访问数据的内容:
MultiIndex([('Control', 'P1'),
('Control', 'P2'),
('Control', 'P3'),
( 'Sample', 'P1'),
( 'Sample', 'P2'),
( 'Sample', 'P3')],
names=['identifier', 'Experiment'])
你现在可以以此为基础:
all_mean = df.groupby(by=['identifier']).mean()
spec_mean = df.groupby(by=['identifier', 'Experiment']).mean()
result = all_mean/spec_mean
Out
values
identifier Experiment
Control P1 1.066127
P2 0.987500
P3 0.971198
Sample P1 1.165714
P2 0.837438
P3 1.040816
现在将数据放入某种平面结构中(?OP对此没有明确说明):
normalization_factors = {idx[1]: result.loc[idx].values[0] for idx in result.index if idx[0] == 'Control'}
# {'P1': 1.0661268556005397, 'P2': 0.9874999999999999, 'P3': 0.9711977520196698}
sample_values = {idx[1]: result.loc[idx].values[0] * normalization_factors[idx[1]] for idx in result.index if idx[0] == 'Sample'}
# {'P1': 1.2427993059572005, 'P2': 0.8269704433497537, 'P3': 1.0108384765919014}
映射sample_data
到df
作为:
df["calculated_col_with_the_name_you_prefer"] = df["Experiment"].map(sample_values)
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句