Write way to calculate weighted average with pandas Dataframe with mean or directly multiply

rra Published at Dev

rra

I have the following pandas Dataframe:

data_df = pd.DataFrame({'id':['era','bb','cs','jd','ek','gtf','okg','huf','mji','loj','djjf','wloe','rfm','cok'],
                        'doc':[1050 ,580,170,8, 7, 220, 45155,305,458,201,48,78,256,358],
                        'dif':[1,1,1,3,3,2,2,3,4,5,8,7,9,10],
                        'usg':[102,258,7083,84,165,627,784,85,658,261,85,12,235,45],
                        'snp':[4.2,2.5,1.03,2.56,1.25,3.56,1.49,2.95,1.05,0.815,0.78,0.612,0.365,0.164]})
data_df

    id    doc      dif       usg         snp
0   era   1050      1        102        4.200
1   bb    580       1        258        2.500
2   cs    170       1        7083       1.030
3   jd    8         3        84         2.560
4   ek    7         3        165        1.250
5   gtf   220       2        627        3.560
6   okg   45155     2        784        1.490
7   huf   305       3        85         2.950
8   mji   458       4        658        1.050
9   loj   201       5        261        0.815
10  djjf  48        8        85         0.780
11  wloe  78        7        12         0.612
12  rfm   256       9        235        0.365
13  cok   358       10       45         0.164

I need to compute the weighted average of all the columns and finally need to make a rank column for the id. I'm planning to give 30%,20%,35%,15% weight for "doc", "dif,"usg" and "snp" columns. If "doc","usg" and "snp" is high it should get the higher weighted value. But for "dif" column if lower the grade it should be get the higher value. I'm actually struggling with that part. How to do that (lower grades get high marks). I simply multiply and get the weighted average. Is that the correct way? Is it possible to write a lambda function for this calculation? My calculation is like that

data_df["Weighted_Average"] = data_df["doc"]*0.3 + data_df["dif"]*.2+ data_df["usg"]*.35+ data_df["snp"]*.15
data_df

    id    doc      dif       usg     snp   Weighted_Average
0   era   1050     1         102    4.2     351.53
1   bb    580      1         258    2.5     264.875
2   cs    170      1         7083   1.03    2530.4045
3   jd    8        3         84     2.56    32.784
4   ek    7        3         165    1.25    60.6375
5   gtf   220      2         627    3.56    286.384
6   okg   45155    2         784    1.49    13821.5235
7   huf   305      3         85     2.95    122.2925
8   mji   458      4         658    1.05    368.6575
9   loj   201      5         261    0.815   152.77225
10  djjf  48       8         85     0.78    45.867
11  wloe  78       7         12     0.612   29.0918
12  rfm   256      9         235    0.365   160.90475
13  cok   358      10        45     0.164   125.1746

However, my calculation is wrong since I didn't consider higher marks to get lower grades. (data_df["dif"]*.2)

I saw several of them use mean to calculate to weighted average and some use directly like this too. Which is the correct way? Any help would be appriciate.

Corralien

Prepare data:

weights = {"doc": 0.3, "dif": 0.2, "usg": 0.35, "snp": 0.15}

df1 = data_df[weights.keys()]  # extract numeric columns
df2 = pd.DataFrame(weights, index=data_df.index)  # build weighted dataframe

df1 = df1.assign(dif=df1["dif"].max() + 1 - df1["dif"])  # reverse grade

Note: reverse grade according to your post. You can use 1/dif too.

Compute your weighted average:

data_df["Weighted_Average"] = df1.mul(df2).sum(axis="columns")

>>> data_df.sort_values("Weighted_Average", ascending=False)
      id    doc  dif   usg    snp  Weighted_Average
6    okg  45155    2   784  1.490       13822.92350
2     cs    170    1  7083  1.030        2532.20450
8    mji    458    4   658  1.050         369.25750
0    era   1050    1   102  4.200         353.33000
5    gtf    220    2   627  3.560         287.78400
1     bb    580    1   258  2.500         266.67500
12   rfm    256    9   235  0.365         159.50475
9    loj    201    5   261  0.815         152.97225
13   cok    358   10    45  0.164         123.37460
7    huf    305    3    85  2.950         123.29250
4     ek      7    3   165  1.250          61.63750
10  djjf     48    8    85  0.780          44.86700
3     jd      8    3    84  2.560          33.78400
11  wloe     78    7    12  0.612          28.49180

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at2021-06-4

Comments

0 comments

TOP Ranking

Article

Write way to calculate weighted average with pandas Dataframe with mean or directly multiply

Write way to calculate weighted average with pandas Dataframe with mean or directly multiply

Loopback Error: connect ECONNREFUSED 127.0.0.1:3306 (MAMP)

Can't pre-populate phone number and message body in SMS link on iPhones when SMS app is not running in the background

pump.io port in URL

How to import an asset in swift using Bundle.main.path() in a react-native native module

Failed to listen on localhost:8000 (reason: Cannot assign requested address)

Spring Boot JPA PostgreSQL Web App - Internal Authentication Error

Emulator wrong screen resolution in Android Studio 1.3

3D Touch Peek Swipe Like Mail

Double spacing in rmarkdown pdf

Svchost high CPU from Microsoft.BingWeather app errors

How to how increase/decrease compared to adjacent cell

Using Response.Redirect with Friendly URLS in ASP.NET

java.lang.NullPointerException: Cannot read the array length because "<local3>" is null

BigQuery - concatenate ignoring NULL

How to fix "pickle_module.load(f, **pickle_load_args) _pickle.UnpicklingError: invalid load key, '<'" using YOLOv3?

ngClass error (Can't bind ngClass since it isn't a known property of div) in Angular 11.0.3

Can a 32-bit antivirus program protect you from 64-bit threats

Make a B+ Tree concurrent thread safe

Bootstrap 5 Static Modal Still Closes when I Click Outside

Vector input in shiny R and then use it

Assembly definition can't resolve namespaces from external packages