熊猫重新索引多列

汤米

我下面总结了许多csv文件中的数据。

+----------+-----+------+------+------+
|   Date   | Loc | Hour | Rain | Wind |
+----------+-----+------+------+------+
| 1-Sep-19 |   1 |    1 |  184 |   65 |
| 1-Sep-19 |   1 |    3 |  126 |   64 |
| 2-Sep-19 |   1 |    1 |  112 |   63 |
| 2-Sep-19 |   1 |    2 |  155 |    0 |
| 2-Sep-19 |   1 |    3 |  186 |   50 |
| 3-Sep-19 |   1 |    2 |  154 |   79 |
| 3-Sep-19 |   1 |    3 |  143 |   61 |
| 1-Sep-19 |   2 |    1 |  187 |   73 |
| 1-Sep-19 |   2 |    2 |  173 |   63 |
| 1-Sep-19 |   2 |    3 |  186 |   63 |
| 3-Sep-19 |   2 |    1 |  172 |   56 |
| 3-Sep-19 |   2 |    2 |  156 |   56 |
| 3-Sep-19 |   2 |    3 |  176 |   79 |
+----------+-----+------+------+------+

当随机的观测RainWind的具体细节DayLocationHour收集。

注意DayLocation&中缺少的行Hour

我想用0填充丢失的数据我想要的最终输出如下。

+----------+-----+------+------+------+
|   Date   | Loc | Hour | Rain | Wind |
+----------+-----+------+------+------+
| 1-Sep-19 |   1 |    1 |  184 |   65 |
| 1-Sep-19 |   1 |    2 |    0 |    0 |
| 1-Sep-19 |   1 |    3 |  126 |   64 |
| 2-Sep-19 |   1 |    1 |  112 |   63 |
| 2-Sep-19 |   1 |    2 |  155 |    0 |
| 2-Sep-19 |   1 |    3 |  186 |   50 |
| 3-Sep-19 |   1 |    1 |    0 |    0 |
| 3-Sep-19 |   1 |    2 |  154 |   79 |
| 3-Sep-19 |   1 |    3 |  143 |   61 |
| 1-Sep-19 |   2 |    1 |  187 |   73 |
| 1-Sep-19 |   2 |    2 |  173 |   63 |
| 1-Sep-19 |   2 |    3 |  186 |   63 |
| 2-Sep-19 |   2 |    1 |    0 |    0 |
| 2-Sep-19 |   2 |    2 |    0 |    0 |
| 2-Sep-19 |   2 |    3 |    0 |    0 |
| 3-Sep-19 |   2 |    1 |  172 |   56 |
| 3-Sep-19 |   2 |    2 |  156 |   56 |
| 3-Sep-19 |   2 |    3 |  176 |   79 |
+----------+-----+------+------+------+

到目前为止,我已经尝试了以下代码

    import pandas as pd
    import numpy as np

    df = pd.read_csv('data.csv')

    d1 = date(2019, 9, 1)
    d2 = date(2019, 9, 3)
    delta = d2 - d1

    # below indexes were created to make the code more dynamic & scalable easily 
    idx_date = [(d1 + timedelta(days=i)) for i in range(delta.days + 1)]
    idx_loc = np.arange(1,3)
    idx_hour = np.arange(1,4)

    cols =['Rain', 'Wind']

    df_filled = df.reindex(index=[idx_date,idx_loc,idx_hour], columns=cols, fill_value=0)

但是,我得到一个错误。如何解决这个问题

耶斯列尔

首先需要创建所有可能的值,MultiIndex以便可能使用reindex

df = df.set_index(['Date','Loc','Hour'])
mux = pd.MultiIndex.from_product(df.index.levels, names=df.index.names)
df = df.reindex(mux, fill_value=0).reset_index()
print (df)
      Date  Loc  Hour  Rain  Wind
0   1Sep19    1     1   184    65
1   1Sep19    1     2     0     0
2   1Sep19    1     3   126    64
3   1Sep19    2     1   187    73
4   1Sep19    2     2   173    63
5   1Sep19    2     3   186    63
6   2Sep19    1     1   112    63
7   2Sep19    1     2   155     0
8   2Sep19    1     3   186    50
9   2Sep19    2     1     0     0
10  2Sep19    2     2     0     0
11  2Sep19    2     3     0     0
12  3Sep19    1     1     0     0
13  3Sep19    1     2   154    79
14  3Sep19    1     3   143    61
15  3Sep19    2     1   172    56
16  3Sep19    2     2   156    56
17  3Sep19    2     3   176    79

您的解决方案应更改:

df['Date'] = pd.to_datetime(df['Date'])

idx_date = pd.date_range('2019-09-01','2019-09-03')
idx_loc = np.arange(1,3)
idx_hour = np.arange(1,4)

cols =['Rain', 'Wind']
mux = pd.MultiIndex.from_product([idx_date,idx_loc,idx_hour], 
                                 names=['Date','Loc','Hour'])
df_filled = (df.set_index(['Date','Loc','Hour'])
               .reindex(index=mux, columns=cols, fill_value=0)
               .reset_index())
print (df_filled)
         Date  Loc  Hour  Rain  Wind
0  2019-09-01    1     1   184    65
1  2019-09-01    1     2     0     0
2  2019-09-01    1     3   126    64
3  2019-09-01    2     1   187    73
4  2019-09-01    2     2   173    63
5  2019-09-01    2     3   186    63
6  2019-09-02    1     1   112    63
7  2019-09-02    1     2   155     0
8  2019-09-02    1     3   186    50
9  2019-09-02    2     1     0     0
10 2019-09-02    2     2     0     0
11 2019-09-02    2     3     0     0
12 2019-09-03    1     1     0     0
13 2019-09-03    1     2   154    79
14 2019-09-03    1     3   143    61
15 2019-09-03    2     1   172    56
16 2019-09-03    2     2   156    56
17 2019-09-03    2     3   176    79

本文收集自互联网,转载请注明来源。

如有侵权,请联系 [email protected] 删除。

编辑于
0

我来说两句

0 条评论
登录 后参与评论

相关文章