我正在尝试将具有三列(日期,开始,结束)的熊猫数据帧转换为频率矩阵。我的输入数据框如下所示:
Date, Start, End
2016-09-02 09:16:00 18 16
2016-09-02 16:14:10 16 1
2016-09-02 06:17:21 18 17
2016-09-02 05:51:07 23 17
2016-09-02 18:34:44 18 17
2016-09-02 05:44:44 20 4
2016-09-02 09:25:22 18 17
2016-09-02 22:27:44 18 17
2016-09-02 16:02:46 0 18
2016-09-02 15:35:07 17 17
2016-09-02 16:06:42 8 17
2016-09-02 14:47:04 16 23
2016-09-02 07:47:24 20 1
...
“开始”和“结束”的值是介于0
和之间的整数23
。“日期”是日期时间。我试图创建的频率矩阵是24 x 24 csv,其中行i
和列j
是输入中出现“ End” =i
和“ Start” =j
的次数。例如,以上数据将创建:
0, 1, 2, 3, 4, 5, 6, 7, 8, 9,10,11,12,13,14,15,16,17,18,19,20,21,22,23
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0
2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0
5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
9, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
10, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
11, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
12, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
13, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
14, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
15, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
16, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0
17, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 4, 0, 0, 0, 0, 1
18, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
19, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
20, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
21, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
22, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
23, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0
要获得额外的帮助,是否可以通过每15分钟创建一个单独的矩阵的方式来完成?这将是672个矩阵,因为此日期范围是一周。我是一个自学成才的初学者,我真的想不出如何以Python方式解决此问题,任何解决方案或建议都将不胜感激。
用简单的计数创建矩阵,然后拆开其中一列:
mat = df.groupby(['Start', 'End']).count().unstack(level=0)
清理日期级别:
mat.columns = mat.columns.droplevel(0)
现在重新索引行和列并转换为整数:
mat.reindex(*[range(0,24)]*2).fillna(0)
详细说明
首先,您计算给定(开始,结束)对出现的次数。groupby针对这两列的结果实际上带回了一个多索引。
df.groupby(['Start', 'End']).count()
Out[134]:
Date
Start End
0 18 1
8 17 1
16 1 1
23 1
17 17 1
18 16 1
17 4
20 1 1
4 1
23 17 1
我们想要从该结果中获得的是在列中获取开始索引。取消堆叠是这样的:
df.groupby(['Start', 'End']).count().unstack(level=0)
Out[135]:
Date
Start 0 8 16 17 18 20 23
End
1 NaN NaN 1.0 NaN NaN 1.0 NaN
4 NaN NaN NaN NaN NaN 1.0 NaN
16 NaN NaN NaN NaN 1.0 NaN NaN
17 NaN 1.0 NaN 1.0 4.0 NaN 1.0
18 1.0 NaN NaN NaN NaN NaN NaN
23 NaN NaN 1.0 NaN NaN NaN NaN
取消堆栈的结果是将“开始”列作为附加的列索引级别移动到当前“日期”列索引的顶部(请参见下文)。这就是为什么我们之后将级别0降低。另一种方法-根据您当前的源代码-可能是预先过滤掉Date列,然后进行堆栈将带来一个层次。
_.columns
Out[136]:
MultiIndex(levels=[['Date'], [0, 8, 16, 17, 18, 20, 23]],
labels=[[0, 0, 0, 0, 0, 0, 0], [0, 1, 2, 3, 4, 5, 6]],
names=[None, 'Start'])
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句