I have the following pandas Dataframe:
data_df = pd.DataFrame({'id':['era','bb','cs','jd','ek','gtf','okg','huf','mji','loj','djjf','wloe','rfm','cok'],
'doc':[1050 ,580,170,8, 7, 220, 45155,305,458,201,48,78,256,358],
'dif':[1,1,1,3,3,2,2,3,4,5,8,7,9,10],
'usg':[102,258,7083,84,165,627,784,85,658,261,85,12,235,45],
'snp':[4.2,2.5,1.03,2.56,1.25,3.56,1.49,2.95,1.05,0.815,0.78,0.612,0.365,0.164]})
data_df
id doc dif usg snp
0 era 1050 1 102 4.200
1 bb 580 1 258 2.500
2 cs 170 1 7083 1.030
3 jd 8 3 84 2.560
4 ek 7 3 165 1.250
5 gtf 220 2 627 3.560
6 okg 45155 2 784 1.490
7 huf 305 3 85 2.950
8 mji 458 4 658 1.050
9 loj 201 5 261 0.815
10 djjf 48 8 85 0.780
11 wloe 78 7 12 0.612
12 rfm 256 9 235 0.365
13 cok 358 10 45 0.164
I need to compute the weighted average of all the columns and finally need to make a rank column for the id. I'm planning to give 30%,20%,35%,15% weight for "doc", "dif,"usg" and "snp" columns. If "doc","usg" and "snp" is high it should get the higher weighted value. But for "dif" column if lower the grade it should be get the higher value. I'm actually struggling with that part. How to do that (lower grades get high marks). I simply multiply and get the weighted average. Is that the correct way? Is it possible to write a lambda function for this calculation? My calculation is like that
data_df["Weighted_Average"] = data_df["doc"]*0.3 + data_df["dif"]*.2+ data_df["usg"]*.35+ data_df["snp"]*.15
data_df
id doc dif usg snp Weighted_Average
0 era 1050 1 102 4.2 351.53
1 bb 580 1 258 2.5 264.875
2 cs 170 1 7083 1.03 2530.4045
3 jd 8 3 84 2.56 32.784
4 ek 7 3 165 1.25 60.6375
5 gtf 220 2 627 3.56 286.384
6 okg 45155 2 784 1.49 13821.5235
7 huf 305 3 85 2.95 122.2925
8 mji 458 4 658 1.05 368.6575
9 loj 201 5 261 0.815 152.77225
10 djjf 48 8 85 0.78 45.867
11 wloe 78 7 12 0.612 29.0918
12 rfm 256 9 235 0.365 160.90475
13 cok 358 10 45 0.164 125.1746
However, my calculation is wrong since I didn't consider higher marks to get lower grades. (data_df["dif"]*.2)
I saw several of them use mean to calculate to weighted average and some use directly like this too. Which is the correct way? Any help would be appriciate.
weights = {"doc": 0.3, "dif": 0.2, "usg": 0.35, "snp": 0.15}
df1 = data_df[weights.keys()] # extract numeric columns
df2 = pd.DataFrame(weights, index=data_df.index) # build weighted dataframe
df1 = df1.assign(dif=df1["dif"].max() + 1 - df1["dif"]) # reverse grade
Note: reverse grade according to your post. You can use 1/dif
too.
data_df["Weighted_Average"] = df1.mul(df2).sum(axis="columns")
>>> data_df.sort_values("Weighted_Average", ascending=False)
id doc dif usg snp Weighted_Average
6 okg 45155 2 784 1.490 13822.92350
2 cs 170 1 7083 1.030 2532.20450
8 mji 458 4 658 1.050 369.25750
0 era 1050 1 102 4.200 353.33000
5 gtf 220 2 627 3.560 287.78400
1 bb 580 1 258 2.500 266.67500
12 rfm 256 9 235 0.365 159.50475
9 loj 201 5 261 0.815 152.97225
13 cok 358 10 45 0.164 123.37460
7 huf 305 3 85 2.950 123.29250
4 ek 7 3 165 1.250 61.63750
10 djjf 48 8 85 0.780 44.86700
3 jd 8 3 84 2.560 33.78400
11 wloe 78 7 12 0.612 28.49180
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments