我有一个带有标签的xlsx文件,可以记录多年的数据。每个选项卡都包含一个包含许多列的表,该表的结构如下:
+-----------+-------+-------------------------+----------------------+
| City | State | Number of Drivers, 2019 | Number of Cars, 2019 |
+-----------+-------+-------------------------+----------------------+
| LA | CA | 123 | 10.0 |
| San Diego | CA | 456 | 2345 |
+-----------+-------+-------------------------+----------------------+
我想重新排列表格,使其看起来像这样,并针对xlsx中的每个标签执行此操作:
+-----------+-------+------+-------------------+---------------+
| City | State | Year | Measure Name | Measure Value |
+-----------+-------+------+-------------------+---------------+
| LA | CA | 2019 | Number of Drivers | 123 |
| San Diego | CA | 2019 | Number of Drivers | 456 |
| LA | CA | 2019 | Number of Cars | 10 |
| San Diego | CA | 2019 | Number of Cars | 2345 |
+-----------+-------+------+-------------------+---------------+
为此有很多可动之处,并且要使最终格式正确还有些棘手。
我们这样做melt
,然后join
用str.split
s=df.melt(['City','State'])
s=s.join(s.variable.str.split(',',expand=True))
Out[120]:
City State variable value 0 1
0 LA CA NumberofDrivers,2019 123.0 NumberofDrivers 2019
1 SanDiego CA NumberofDrivers,2019 456.0 NumberofDrivers 2019
2 LA CA NumberofCars,2019 10.0 NumberofCars 2019
3 SanDiego CA NumberofCars,2019 2345.0 NumberofCars 2019
# if you need change the name adding .rename(columns={}) at the end
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句