使用基于for循环的pandas中的多个特定条件来计算多个列

Danish 发表于 Dev

丹麦文

我有一个数据框，如下所示。

 B_ID   No_Show   Session  slot_num   Patient_count
    1     0.2       S1        1          1
    2     0.3       S1        2          1
    3     0.8       S1        3          1
    4     0.3       S1        3          2
    5     0.6       S1        4          1
    6     0.8       S1        5          1
    7     0.9       S1        5          2
    8     0.4       S1        5          3
    9     0.6       S1        5          4
    12    0.9       S2        1          1
    13    0.5       S2        1          2
    14    0.3       S2        2          1
    15    0.7       S2        3          1
    20    0.7       S2        4          1
    16    0.6       S2        5          1
    17    0.8       S2        5          2
    19    0.3       S2        5          3

哪里

No_Show =没出现的概率

假使，假设

p = [0.2，0.4]，每个时段的持续时间= 30（分钟）

p =阈值概率

从上面我想计算下面的数据框

第1步

根据Session，slot_number和Patient_count对数据帧进行排序

df = df.sort_values(['Session', 'slot_num', 'Patient_count'], ascending=False)

步骤2使用以下条件计算截止值

如果Patient_count = 1，如果Patient_count = 1，则将No_show除以阈值概率

Example for B_ID = 3, Patient_count = 1, cut_off = 0.8/0.2 = 4

否则，如果Patient_count = 2以前乘以1 No_Show与当前No_show并除以阈值）

Example for B_ID = 4, Patient_count = 2, cut_off = (0.3*0.8)/0.2 = 1.2

否则，如果Patient_count = 3先前乘以2 No_Show与当前No_show并除以阈值

Example for B_ID = 8, Patient_count = 3, cut_off = (0.4*0.9*0.8)/0.2 = 1.44

等等

预期输出：

B_ID   No_Show   Session  slot_num   Patient_count Cut_off_0.2   Cut_off_0.4
    1     0.2       S1        1          1             1             0.5
    2     0.3       S1        2          1             1.5           0.75
    3     0.8       S1        3          1             4              2
    4     0.3       S1        3          2             1.2            0.6
    5     0.6       S1        4          1             3              1.5
    6     0.8       S1        5          1             4              2
    7     0.9       S1        5          2             3.6            1.8
    8     0.4       S1        5          3             1.44           0.72
    9     0.6       S1        5          4             0.864          0.432
    12    0.9       S2        1          1             4.5            2.25
    13    0.5       S2        1          2             2.25           1.125
    14    0.3       S2        2          1             1.5            0.75
    15    0.7       S2        3          1             3.5            1.75
    20    0.7       S2        4          1             3.5            1.75
    16    0.6       S2        5          1             3              1.5
    17    0.8       S2        5          2             2.4            1.2
    19    0.3       S2        5          3             0.72           0.36

我尝试下面的代码

p = [0.2, 0.4]
for i in p:
    df['Cut_off_'+'i'] = df.groupby(['Session','slot_num'])['No_Show'].cumprod().div(i)

耶斯列尔

您可以在f-strings中使用{i}来解决新列名称：

p = [0.2, 0.4]
for i in p:
    df[f'Cut_off_{i}'] = df.groupby(['Session','slot_num'])['No_Show'].cumprod().div(i)

也可以使用numpy解决方案-将输出转换为numpy数组并除以p，然后转换为DataFrame并加入原始格式。

p = [0.2, 0.4]
arr = df.groupby(['Session','slot_num'])['No_Show'].cumprod().values[:, None] / np.array(p)

df = df.join(pd.DataFrame(arr, columns=p, index=df.index).add_prefix('Cut_off_'))

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。