I have 20 columns in a dataframe. I list 4 of them here as example:
is_guarantee: 0 or 1
hotel_star: 0, 1, 2, 3, 4, 5
order_status: 40, 60, 80
journey (Label): 0, 1, 2
is_guarantee hotel_star order_status journey
0 0 5 60 0
1 1 5 60 0
2 1 5 60 0
3 0 5 60 1
4 0 4 40 0
5 0 4 40 1
6 0 4 40 1
7 0 3 60 0
8 0 2 60 0
9 1 5 60 0
10 0 2 60 0
11 0 2 60 0
But the system need to input the occurrence matrix like the following format to function:
Can any body help?
df1 = pd.DataFrame(index=range(0,20))
df1['is_guarantee'] = np.random.choice([0,1], df1.shape[0])
df1['hotel_star'] = np.random.choice([0,1,2,3,4,5], df1.shape[0])
df1['order_status'] = np.random.choice([40,60,80], df1.shape[0])
df1['journey '] = np.random.choice([0,1,2], df1.shape[0])
I think you need:
melt
and get counts by groupby
with size
, reshape by unstack
MultiIndex
to index
:df = (df.melt('journey')
.astype(str)
.groupby(['variable', 'journey','value'])
.size()
.unstack(1, fill_value=0))
df = (df.div(df.sum(1), axis=0)
.mul(100)
.add_prefix('journey_')
.set_index(df.index.map(' = '.join))
.rename_axis(None, 1))
print (df)
journey_0 journey_1
hotel_star = 2 100.000000 0.000000
hotel_star = 3 100.000000 0.000000
hotel_star = 4 33.333333 66.666667
hotel_star = 5 80.000000 20.000000
is_guarantee = 0 66.666667 33.333333
is_guarantee = 1 100.000000 0.000000
order_status = 40 33.333333 66.666667
order_status = 60 88.888889 11.111111
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments