How to get percentage count based on multiple columns in pandas dataframe?

DataHolic

I have 20 columns in a dataframe. I list 4 of them here as example:

is_guarantee: 0 or 1
hotel_star: 0, 1, 2, 3, 4, 5
order_status: 40, 60, 80
journey (Label): 0, 1, 2

    is_guarantee  hotel_star  order_status  journey
0              0           5            60        0
1              1           5            60        0
2              1           5            60        0
3              0           5            60        1
4              0           4            40        0
5              0           4            40        1
6              0           4            40        1
7              0           3            60        0
8              0           2            60        0
9              1           5            60        0
10             0           2            60        0
11             0           2            60        0

Click to View Image

But the system need to input the occurrence matrix like the following format to function:

Click to View Image

Can any body help?

df1 = pd.DataFrame(index=range(0,20))
df1['is_guarantee'] = np.random.choice([0,1], df1.shape[0])
df1['hotel_star'] = np.random.choice([0,1,2,3,4,5], df1.shape[0])
df1['order_status'] = np.random.choice([40,60,80], df1.shape[0])
df1['journey '] = np.random.choice([0,1,2], df1.shape[0])
jezrael

I think you need:

  • reshape by melt and get counts by groupby with size, reshape by unstack
  • then divide sum per rows and join MultiIndex to index:

df = (df.melt('journey')
       .astype(str)
       .groupby(['variable', 'journey','value'])
       .size()
       .unstack(1, fill_value=0))

df = (df.div(df.sum(1), axis=0)
        .mul(100)
        .add_prefix('journey_')
        .set_index(df.index.map(' = '.join))
        .rename_axis(None, 1))

print (df)

                    journey_0  journey_1
hotel_star = 2     100.000000   0.000000
hotel_star = 3     100.000000   0.000000
hotel_star = 4      33.333333  66.666667
hotel_star = 5      80.000000  20.000000
is_guarantee = 0    66.666667  33.333333
is_guarantee = 1   100.000000   0.000000
order_status = 40   33.333333  66.666667
order_status = 60   88.888889  11.111111

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

Python: get a frequency count based on two columns (variables) in pandas dataframe some row appers

How to get value counts for multiple columns at once in Pandas DataFrame?

Pandas - dataframe groupby - how to get sum of multiple columns

percentage count for pandas df grouped by multiple columns

Pandas Percentage count on a DataFrame groupby

rename multiple columns of pandas dataframe based on condition

How to apply cummulative count on multiple columns of dataframe

How to get unique information from multiple columns of a pandas dataframe?

How to count unique dates based on multiple or columns

How to select rows in Pandas dataframe based on string matching in multiple columns

Pandas: Sort a dataframe based on multiple columns

How to create multiple columns in Pandas dataframe based on year

How to work out percentage of total with groupby for specific columns in a pandas dataframe?

Melt multiple columns pandas dataframe based on criteria

How to count based on multiple columns in SQL Server?

pandas dataframe column based on row and multiple columns

Get the difference of columns in percentage pandas

Python, Pandas - count values based on multiple criteria in row and multiple columns

Overwriting Pandas dataframe is NA, based on multiple columns

Pandas - How to get get sum of rows by multiple columns in a DataFrame

How to get multiple aggregation in a dataframe? cumsum and count columns

Split dataframe based on multiple columns pandas groupby

How to get count(percentage) for columns after each groupby item?

How to estimate count for Pandas dataframe column values based on multiple conditions?

How to get the null value count/percentage for each columns in bigquery

Pandas groupby and get nunique of multiple columns in a dataframe

count occurrence of a value in multiple columns of a dataframe Pandas

How to reorder columns of pandas dataframe based on multiple conditions?

How to count and group multiple columns in R dataframe?