How can I conditionally sum values from different columns after aggregation?

user2549803

I have this dataframe to begin with:

ID PRODUCT_ID        NAME  STOCK  SELL_COUNT DELIVERED_BY PRICE_A PRICE_B
1         P1  PRODUCT_P1     12          15          UPS   32,00   40,00
2         P2  PRODUCT_P2      4           3          DHL    8,00     NaN
3         P3  PRODUCT_P3    120          22          DHL     NaN  144,00
4         P1  PRODUCT_P1    423          18          UPS   98,00     NaN
5         P2  PRODUCT_P2      0           5          GLS   12,00   18,00
6         P3  PRODUCT_P3     53          10          DHL   84,00     NaN
7         P4  PRODUCT_P4     22           0          UPS    2,00     NaN
8         P1  PRODUCT_P1     94          56          GLS     NaN   49,00
9         P1  PRODUCT_P1      9          24          GLS     NaN    1,00

What I'm trying to achieve is - after aggregating by PRODUCT_ID, to sum PRICE_A or PRICE_B depending on whether they have a value or not (prioritizing PRICE_A if both are set).

Based on @WeNYoBen 's helping answer, I now know how to conditionally apply aggregation functions depending on different columns:

def custom_aggregate(grouped):

    data = {
        'STOCK': grouped.loc[grouped['DELIVERED_BY'] == 'UPS', 'STOCK'].min(),
        'TOTAL_SELL_COUNT': grouped.loc[grouped['ID'] > 6, 'SELL_COUNT'].sum(min_count=1),
        'COND_SELL_COUNT': grouped.loc[grouped['SELL_COUNT'] > 10, 'SELL_COUNT'].sum(min_count=1)
        # THIS IS WHERE THINGS GET FOGGY...
        # I somehow need to add a second condition here, that says 
        # if PRICE_B is set - use the PRICE_B value for the sum()
        'COND_PRICE': grouped.loc[grouped['PRICE_A'].notna(), 'PRICE_A'].sum()
    }

    d_series = pd.Series(data)
    return d_series

result = df_products.groupby('PRODUCT_ID').apply(custom_aggregate)

I really don't know if this is possible by using the .loc function. One way to solve this could be to create an additional column before calling .groupby that already contains the correct price values. But I thought there might be a more flexible way of doing this. I'd be happy to somehow apply a custom function for the 'COND_PRICE' value calculation that gets executed before passing the results to sum(). In SQL I could nest x levels of CASE WHEN END statements in order to implement this kind of logic. Just curious about how to implement this flexibility in pandas.

Thanks a lot.

BENY

So here is the solution we need fillna

def custom_aggregate(grouped):

    data = {
        'STOCK': grouped.loc[grouped['DELIVERED_BY'] == 'UPS', 'STOCK'].min(),
        'TOTAL_SELL_COUNT': grouped.loc[grouped['ID'] > 6, 'SELL_COUNT'].sum(min_count=1),
        'COND_SELL_COUNT': grouped.loc[grouped['SELL_COUNT'] > 10, 'SELL_COUNT'].sum(min_count=1),
        # Fillna if A have the value A return , if not check with B , both nan will keep the value as nan
        'COND_PRICE': grouped['PRICE_A'].fillna(grouped['PRICE_B']).sum()
    }

    d_series = pd.Series(data)
    return d_series

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

How can I find the cumulative sum from the different two columns?

DAX to conditionally sum from different columns?

How can I create a new column of values based on the grouped sum of values from two other columns?

How do I easily sum up values in different columns?

How To Find Duplicate, Count and Sum Values from different columns in MySQL?

After comparing two columns, how can I remove unique values in one column with its values in other different columns in R

How can I concatenate columns and sum values from two data frames?

I need to append result of Aggregation in mongoDb with average and sum of different columns

Sum columns values in conditional aggregation

How can I get sum of values from different divs with the same class using only javascript without libraries?

How can i sum values to a pandas existing column from different dictionaries based on index?

How can I sum values from controls in MVC right after the user enter the value?

Oracle 11g how I can group values from two different columns

In R, how can I match the first 3 characters of values from 2 different columns

Oracle SQL How can I separate values from a column in two different columns?

How can I manipulate dataframe columns with different values from an external vector (with dplyr)

How to calculate a sum conditionally based on the values of two other columns

How can I merge results to a single dataframe after exctracting columns from different excel sheets?

How do I sum values from different rows in the tidyverse?

How can I count the amount of values in different columns in oracle plsql

How can I add additional columns to a SELECT query after an aggregation without including them in GROUP BY?

How do I apply SUM() aggregation function to many columns at once?

How do I pull the values from multiple columns, conditionally, into a new column?

How can I put null values to separate field and others to different to field in MongoDB aggregation?

How can I sum two fileds from different tables in PIG?

MySQL sum of values from different tables with different number of columns

How can I conditionally remove data values the day after a condition is met within an R dataframe?

Can I get counts for different field values in a MongoDB aggregation pipeline?

How can I take values from EditText and sum them in array