如何使用python pandas将CSV解析为所需的格式？

艾伦·兰

我是python熊猫的新手。我有一个CSV文件，如下所示：

insectName   count   weather  location   time        date      Condition
  aaa         15      sunny   balabala  0900:1200   1990-02-10     25
  bbb         10      sunny   balabala  0900:1200   1990-02-10     25
  ccc         20      sunny   balabala  0900:1200   1990-02-10     25
  ddd         50      sunny   balabala  0900:1200   1990-02-10     25
  ...        ...      ...      ...        ...            ...       ...
  XXX         40      sunny   balabala  1300:1500   1990-02-15     38
  yyy         10      sunny   balabala  1300:1500   1990-02-15     38
  yyy         25      sunny   balabala  1300:1500   1990-02-15     38

该文件中包含许多数据，并且每天的insectName都可以重复。我想连续一天按“日期”转换数据使用情况。像这样：

insectName  count  insectName  count  insectName  count  weather  location  time        date      Condition
  ccc         20      bbb       10       aaa        15    sunny   balabala  0900:1200   1990-02-10     25
  yyy         25      yyy       10       XXX        40    sunny   balabala  1300:1500   1990-02-15     38
  ...        ...      ...      ...       ...        ...    ...      ...        ...            ...        ...

我该怎么办？

忘了它

有一个groupby/cumcount/unstack技巧可以将长格式数据帧转换为宽格式数据帧：

import pandas as pd
df = pd.read_table('data', sep='\s+')

common = ['weather', 'location', 'time', 'date', 'Condition']
grouped = df.groupby(common)
df['idx'] = grouped.cumcount()
df2 = df.set_index(common+['idx'])
df2 = df2.unstack('idx')
df2 = df2.swaplevel(0, 1, axis=1)
df2 = df2.sortlevel(axis=1)
df2.columns = df2.columns.droplevel(0)
df2 = df2.reset_index()
print(df2)

产量

  weather  location       time        date  Condition insectName  count  \
0   sunny  balabala  0900:1200  1990-02-10         25        aaa     15   
1   sunny  balabala  1300:1500  1990-02-15         38        XXX     40   

  insectName  count insectName  count insectName  count  
0        bbb     10        ccc     20        ddd     50  
1        yyy     10        yyy     25        NaN    NaN

虽然宽格式可能对表示有用，但请注意，长格式通常是用于数据处理的正确格式。请参阅Hadley Wickham关于整齐数据的优点的文章（PDF）。

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。