在这里,我有一个交易数据集。每个交易可以具有1个以上不同的值-“维度”。每个交易的值不能相同。我想创建一个在列和行中都带有“维度”的数据框,并计算每个事务中一个维与另一个维一起使用的次数。
这是我尝试过的
dim_set = [ (1, 'Customer group$Large'),
(1, 'DEPARTMENT$Sales'),
(2, 'Customer group$Medium'),
(2, 'DEPARTMENT$Sales'),
(3, 'DEPARTMENT$Sales'),
(4, 'Customer group$Small'),
(4, 'DEPARTMENT$Sales')
]
df = pd.DataFrame(dim_set, columns=['combination_id', 'dimension'])
df
df_st_1 = df.pivot_table(index = 'dimension', columns = 'dimension',values = 'combination_id', aggfunc = 'count')
df_st_1
预期的结果应该是这样的
dim_set = [ ('Customer group$Large', 1, 1, 0, 0),
('DEPARTMENT$Sales', 1, 4, 1, 1),
('Customer group$Medium', 0, 1, 1, 0),
('Customer group$Small', 0, 1, 0, 1)
]
df = pd.DataFrame(dim_set, columns=['dimension','Customer group$Large', 'DEPARTMENT$Sales', 'Customer group$Medium', 'Customer group$Small'])
df
DataFrame.merge
与配合使用crosstab
,最后通过DataFrame.reset_index
和清除一些数据DataFrame.rename_axis
:
df1 = df.merge(df, on='combination_id', suffixes=('','_'))
df1 = (pd.crosstab(df1['dimension'], df1['dimension_'])
.reset_index()
.rename_axis(None)
.rename_axis(None, axis=1))
print (df1)
dimension Customer group$Large Customer group$Medium \
0 Customer group$Large 1 0
1 Customer group$Medium 0 1
2 Customer group$Small 0 0
3 DEPARTMENT$Sales 1 1
Customer group$Small DEPARTMENT$Sales
0 0 1
1 0 1
2 1 1
3 1 4
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句