假设我有以下客户数据表
df = pd.DataFrame.from_dict({"Customer":[0,0,1],
"Date":['01.01.2016', '01.02.2016', '01.01.2016'],
"Type":["First Buy", "Second Buy", "First Buy"],
"Value":[10,20,10]})
看起来像这样:
Customer | Date | Type | Value
-----------------------------------------
0 |01.01.2016|First Buy | 10
-----------------------------------------
0 |01.02.2016|Second Buy| 20
-----------------------------------------
1 |01.01.2016|First Buy | 10
我想通过“类型”列透视表。但是,数据透视仅给出数值“值”列。我想要一个像这样的结构:
Customer | First Buy Date | First Buy Value | Second Buy Date | Second Buy Value
---------------------------------------------------------------------------------
缺失值是NAN还是NAT的地方,是否可以使用ivot_table进行?如果没有,我可以想象一些解决方法,但是它们相当宽松。还有其他建议吗?
用途unstack
:
df1 = df.set_index(['Customer', 'Type']).unstack()
df1.columns = ['_'.join(cols) for cols in df1.columns]
print (df1)
Date_First Buy Date_Second Buy Value_First Buy Value_Second Buy
Customer
0 01.01.2016 01.02.2016 10.0 20.0
1 01.01.2016 None 10.0 NaN
如果需要其他顺序的列,请使用swaplevel
和sort_index
:
df1 = df.set_index(['Customer', 'Type']).unstack()
df1.columns = ['_'.join(cols) for cols in df1.columns.swaplevel(0,1)]
df1.sort_index(axis=1, inplace=True)
print (df1)
First Buy_Date First Buy_Value Second Buy_Date Second Buy_Value
Customer
0 01.01.2016 10.0 01.02.2016 20.0
1 01.01.2016 10.0 None NaN
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句