I am working on a table named kpi where I need to calculate the difference of (latest B1 - earliest B0) for each BoxID per week using the Minimum No. and Maximum No. for each BoxID per week, so you will need to use the minimum value of No. and use the corresponding B0 and use the max value of No. and use the corresponding B1, then find the difference. - I coded in R with some help, but now I need the code in Python too.
kpi %>%
group_by(BoxID, WkNo) %>%
mutate(i = which.min(No.),
j = which.max(No.)) %>%
mutate(First_B0 = B0[i],
Last_B1 = B1[j],
Diff = Last_B1 - First_B0) %>%
select(-i, -j)
Input table -
Date No. BoxID B0 B1 WkNo
29.10.2018 61931 1 0 0 44
15.11.2018 115763 1 5.38 5.38 46
16.11.2018 119833 1 51.86 52.23 46
29.10.2018 60486 3 23.26 22.97 44
10.11.2018 99576 3 1336.53 1336.53 45
14.11.2018 112259 3 1.19 1.04 46
16.11.2018 117965 3 8.68 47.22 46
16.11.2018 118092 3 47.22 47.22 46
15.11.2018 115396 4 82.05 82.05 46
Expected output table -
Date No. BoxID B0 B1 WkNo First_b1 Last_b0 Diff
29.10.2018 61931 1 0 0 44 0 0 0
15.11.2018 115763 1 5.38 5.38 46 52.23 5.38 46.85
16.11.2018 119833 1 51.86 52.23 46 52.23 5.38 46.85
29.10.2018 60486 3 23.26 22.97 44 22.97 23.26 -0.29
10.11.2018 99576 3 1336.53 1336.53 45 1336.53 1336.53 0
14.11.2018 112259 3 1.19 1.04 46 47.22 1.19 46.03
16.11.2018 117965 3 8.68 47.22 46 47.22 1.19 46.03
16.11.2018 118092 3 47.22 47.22 46 47.22 1.19 46.03
15.11.2018 115396 4 82.05 82.05 46 82.05 82.05 0
I need some help to compute the 3 more columns. Thank you in advance.
In python, these types of problems are best solved using the pandas library. As an R user, you'll find the dataframes easy to work with, since pandas was designed to be very much like R.
Assuming your table is a .csv file called "kpi.csv"...
>>import pandas as pd
>>df = pd.read_csv('kpi.csv')
>>df2 = df.loc[df.groupby(['BoxID','WkNo'])['No.'].idxmax(),['BoxID','WkNo','B1']]
>>df3 = df.loc[df.groupby(['BoxID','WkNo'])['No.'].idxmin(),['BoxID','WkNo','B0']]
>>df = df.merge(df2,on=['BoxID','WkNo'])
>>df = df.merge(df3,on=['BoxID','WkNo'])
>>df = df.rename(columns = {'B0_x' : 'B0',
'B1_x' : 'B1',
'B1_y' : 'First_b1',
'B0_y' : 'Last_b0'})
>>df['Diff'] = df['First_b1'] - df['Last_b0']
>>print(df)
Date No. BoxID B0 ... WkNo First_b1 Last_b0 Diff
0 29.10.2018 61931 1 0.00 ... 44 0.00 0.00 0.00
1 15.11.2018 115763 1 5.38 ... 46 52.23 5.38 46.85
2 16.11.2018 119833 1 51.86 ... 46 52.23 5.38 46.85
3 29.10.2018 60486 3 23.26 ... 44 22.97 23.26 -0.29
4 10.11.2018 99576 3 1336.53 ... 45 1336.53 1336.53 0.00
5 14.11.2018 112259 3 1.19 ... 46 47.22 1.19 46.03
6 16.11.2018 117965 3 8.68 ... 46 47.22 1.19 46.03
7 16.11.2018 118092 3 47.22 ... 46 47.22 1.19 46.03
8 15.11.2018 115396 4 82.05 ... 46 82.05 82.05 0.00
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments