我有一个df,如下所示
df:
ID Limit N_30 N_31_90 N_91_180 N_180_365
1 500 60 15 30 1
2 300 0 15 5 10
3 800 0 0 10 6
4 100 0 0 0 370
5 600 0 6 5 10
6 800 0 0 15 6
7 500 10 10 30 9
8 200 0 0 0 0
关于数据
ID - customer ID
Limit - Limit
N_30 - Number of transaction in last 30 days
N_31_90 - Number of transaction in last 31 to 90 days.
N_91_180 - Number of transaction in last 91 to 180 days.
N_180_365 - Number of transaction in last 281 to 365 days.
从上面的df中,我想提取一个名为的列Recency
。
说明:
if df['N_30'] != 0, then Recency = (30/df['N_30'])
elif df['N_31_90'] != 0 then Recency = 30 + (60/df['N_31_90'])
elif df['N_91_180'] != 0 then Recency = 90 + (90/df['N_91_180'])
elif df['N_181_365'] != 0 then Recency = 180 + (185/df['N_181_365'])
else Recency = 730
预期产量:
ID Limit N_30 N_31_90 N_91_180 N_180_365 Recency
1 500 60 15 30 1 (30/60) = 0.5
2 300 0 15 5 10 30+(60/15) = 34
3 800 0 0 10 6 90+90/10 = 100
4 100 0 0 0 370 180+(185/370) = 180.5
5 600 0 6 5 10 30+(60/6) = 36
6 800 0 0 15 6 90+(90/15) = 96
7 500 10 10 30 9 30/10 = 3
8 200 0 0 0 0 730
IIUC,使用布尔掩码与bfill
:
pd.set_option("use_inf_as_na", True)
df2 = df.filter(like="N_")
df["Recency"] = (df2.eq(0) * [30, 60, 90, 180]).sum(1) + ([30, 60, 90, 185] / df2).bfill(1).iloc[:, 0]
print(df)
输出:
ID Limit N_30 N_31_90 N_91_180 N_180_365 Recency
0 1 500 60 15 30 1 0.5
1 2 300 0 15 5 10 34.0
2 3 800 0 0 10 6 99.0
3 4 100 0 0 0 370 180.5
4 5 600 0 6 5 10 40.0
5 6 800 0 0 15 6 96.0
6 7 500 10 10 30 9 3.0
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句