不幸的是,由于软件错误在开发环境中不够明显,无法被识别,因此发生了我们创建大量实际上不需要的SQL记录的情况。记录不会损害数据完整性或其他任何东西,但是根本没有必要。
我们正在研究如下数据库模式:
entity_static (just some static data that won't change):
id | val1 | val2 | val3
-----------------------
1 | 50 | 183 | 93
2 | 60 | 823 | 123
entity_dynamic (some dynamic data we need a historical record of):
id | entity_static_id | val1 | val2 | valid_from | valid_to
-------------------------------------------------------------------------------
1 | 1 | 50 | 75 | 2018-01-01 00:00:00 | 2018-01-01 00:59:59
2 | 1 | 50 | 75 | 2018-01-01 01:00:00 | 2018-01-01 01:59:59
3 | 1 | 50 | 75 | 2018-01-01 02:00:00 | 2018-01-01 02:59:59
4 | 1 | 50 | 75 | 2018-01-01 03:00:00 | 2018-01-01 03:59:59
5 | 2 | 60 | 75 | 2018-01-01 00:00:00 | 2018-01-01 00:59:59
6 | 2 | 60 | 75 | 2018-01-01 01:00:00 | 2018-01-01 01:59:59
7 | 2 | 60 | 75 | 2018-01-01 02:00:00 | 2018-01-01 02:59:59
8 | 2 | 60 | 75 | 2018-01-01 03:00:00 | 2018-01-01 03:59:59
除了val1
和以外,还有更多列val2
,这只是一个示例。
该entity_dynamic
表描述了在给定时间内有效的参数。它不是某个时间点的记录(例如传感器数据)。
因此,可以将所有相等的记录轻松地汇总为一条记录,如下所示:
id | entity_static_id | val1 | val2 | valid_from | valid_to
-------------------------------------------------------------------------------
1 | 1 | 50 | 75 | 2018-01-01 00:00:00 | 2018-01-01 03:59:59
5 | 2 | 60 | 75 | 2018-01-01 00:00:00 | 2018-01-01 03:59:59
valid_to
列中的数据可能为NULL
。
我的问题是,现在可以使用哪些查询将具有连续有效范围的相似记录聚合到一个记录中。分组应使用上的外键完成entity_static_id
。
with entity_dynamic as
(
select
*
from
(values
('1','1','50','75',' 2018-01-01 00:00:00 ',' 2018-01-01 00:59:59')
,('2','1','50','75',' 2018-01-01 01:00:00 ',' 2018-01-01 01:59:59')
,('3','1','50','75',' 2018-01-01 02:00:00 ',' 2018-01-01 02:59:59')
,('4','1','50','75',' 2018-01-01 03:00:00 ',' 2018-01-01 03:59:59')
,('5','2','60','75',' 2018-01-01 00:00:00 ',' 2018-01-01 00:59:59')
,('6','2','60','75',' 2018-01-01 01:00:00 ',' 2018-01-01 01:59:59')
,('7','2','60','75',' 2018-01-01 02:00:00 ',' 2018-01-01 02:59:59')
,('8','2','60','75',' 2018-01-01 03:00:00 ',' 2018-01-01 03:59:59')
,('9','1','60','75',' 2018-01-01 04:00:00 ',' 2018-01-01 04:59:59')
,('10','1','60','75',' 2018-01-01 05:00:00 ',' 2018-01-01 05:59:59')
,('11','2','70','75',' 2018-01-01 04:00:00 ',' 2018-01-01 04:59:59')
,('12','2','70','75',' 2018-01-01 05:00:00 ',' 2018-01-01 05:59:59')
,('13','2','60','75',' 2018-01-01 06:00:00 ',' 2018-01-01 06:59:59')
)
a(id , entity_static_id , val1 , val2 , valid_from , valid_to)
)
,
首先,为每个entity_static_id(唯一组)添加val1和val2的唯一组合的行号,然后为entity_static_id添加行号。由valid_from降序排列
step1 as
(
select
id , entity_static_id , val1 , val2 , valid_from , valid_to
,row_number() over (partition by entity_static_id,val1,val2 order by valid_from) valrn
,ROW_NUMBER() over (partition by entity_static_id order by valid_from desc) rn
from entity_dynamic
)
这给出:
+----------------------------------------------------------------------------------------+
|id|entity_static_id|val1|val2|valid_from |valid_to |unique_group|rn|
+----------------------------------------------------------------------------------------+
|10|1 |60 |75 | 2018-01-01 05:00:00 | 2018-01-01 05:59:59|2 |1 |
|9 |1 |60 |75 | 2018-01-01 04:00:00 | 2018-01-01 04:59:59|1 |2 |
|4 |1 |50 |75 | 2018-01-01 03:00:00 | 2018-01-01 03:59:59|4 |3 |
|3 |1 |50 |75 | 2018-01-01 02:00:00 | 2018-01-01 02:59:59|3 |4 |
|2 |1 |50 |75 | 2018-01-01 01:00:00 | 2018-01-01 01:59:59|2 |5 |
|1 |1 |50 |75 | 2018-01-01 00:00:00 | 2018-01-01 00:59:59|1 |6 |
|13|2 |60 |75 | 2018-01-01 06:00:00 | 2018-01-01 06:59:59|5 |1 |
|12|2 |70 |75 | 2018-01-01 05:00:00 | 2018-01-01 05:59:59|2 |2 |
|11|2 |70 |75 | 2018-01-01 04:00:00 | 2018-01-01 04:59:59|1 |3 |
|8 |2 |60 |75 | 2018-01-01 03:00:00 | 2018-01-01 03:59:59|4 |4 |
|7 |2 |60 |75 | 2018-01-01 02:00:00 | 2018-01-01 02:59:59|3 |5 |
|6 |2 |60 |75 | 2018-01-01 01:00:00 | 2018-01-01 01:59:59|2 |6 |
|5 |2 |60 |75 | 2018-01-01 00:00:00 | 2018-01-01 00:59:59|1 |7 |
+----------------------------------------------------------------------------------------+
步骤2是将每个唯一组的行号与总行num相加,因为最后一个行是降序的,因此具有相等值的行彼此相邻vil具有相同的总和,在此示例中称为tar
,step2 as
(
select
*
,unique_group+rn tar
from step1
)
步骤2给出:
+--------------------------------------------------------------------------------------------+
|id|entity_static_id|val1|val2|valid_from |valid_to |unique_group|rn|tar|
+--------------------------------------------------------------------------------------------+
|10|1 |60 |75 | 2018-01-01 05:00:00 | 2018-01-01 05:59:59|2 |1 |3 |
|9 |1 |60 |75 | 2018-01-01 04:00:00 | 2018-01-01 04:59:59|1 |2 |3 |
|4 |1 |50 |75 | 2018-01-01 03:00:00 | 2018-01-01 03:59:59|4 |3 |7 |
|3 |1 |50 |75 | 2018-01-01 02:00:00 | 2018-01-01 02:59:59|3 |4 |7 |
|2 |1 |50 |75 | 2018-01-01 01:00:00 | 2018-01-01 01:59:59|2 |5 |7 |
|1 |1 |50 |75 | 2018-01-01 00:00:00 | 2018-01-01 00:59:59|1 |6 |7 |
|13|2 |60 |75 | 2018-01-01 06:00:00 | 2018-01-01 06:59:59|5 |1 |6 |
|12|2 |70 |75 | 2018-01-01 05:00:00 | 2018-01-01 05:59:59|2 |2 |4 |
|11|2 |70 |75 | 2018-01-01 04:00:00 | 2018-01-01 04:59:59|1 |3 |4 |
|8 |2 |60 |75 | 2018-01-01 03:00:00 | 2018-01-01 03:59:59|4 |4 |8 |
|7 |2 |60 |75 | 2018-01-01 02:00:00 | 2018-01-01 02:59:59|3 |5 |8 |
|6 |2 |60 |75 | 2018-01-01 01:00:00 | 2018-01-01 01:59:59|2 |6 |8 |
|5 |2 |60 |75 | 2018-01-01 00:00:00 | 2018-01-01 00:59:59|1 |7 |8 |
+--------------------------------------------------------------------------------------------+
最后,您可以使用min和maxm并根据正确的值进行分组来找到有效日期和有效日期。
select
min(id) id
,entity_static_id
,val1
,val2
,min(valid_from) valid_from
,max(valid_to) valid_to
from step2
group by entity_static_id,val1
,val2
,tar
order by entity_static_id,valid_from
总的来说,代码是:
with entity_dynamic as
(
select
*
from
(values
('1','1','50','75',' 2018-01-01 00:00:00 ',' 2018-01-01 00:59:59')
,('2','1','50','75',' 2018-01-01 01:00:00 ',' 2018-01-01 01:59:59')
,('3','1','50','75',' 2018-01-01 02:00:00 ',' 2018-01-01 02:59:59')
,('4','1','50','75',' 2018-01-01 03:00:00 ',' 2018-01-01 03:59:59')
,('5','2','60','75',' 2018-01-01 00:00:00 ',' 2018-01-01 00:59:59')
,('6','2','60','75',' 2018-01-01 01:00:00 ',' 2018-01-01 01:59:59')
,('7','2','60','75',' 2018-01-01 02:00:00 ',' 2018-01-01 02:59:59')
,('8','2','60','75',' 2018-01-01 03:00:00 ',' 2018-01-01 03:59:59')
,('9','1','60','75',' 2018-01-01 04:00:00 ',' 2018-01-01 04:59:59')
,('10','1','60','75',' 2018-01-01 05:00:00 ',' 2018-01-01 05:59:59')
,('11','2','70','75',' 2018-01-01 04:00:00 ',' 2018-01-01 04:59:59')
,('12','2','70','75',' 2018-01-01 05:00:00 ',' 2018-01-01 05:59:59')
,('13','2','60','75',' 2018-01-01 06:00:00 ',' 2018-01-01 06:59:59')
)
a(id , entity_static_id , val1 , val2 , valid_from , valid_to)
)
,step1 as
(
select
id , entity_static_id , val1 , val2 , valid_from , valid_to
,row_number() over (partition by entity_static_id,val1,val2 order by valid_from) unique_group
,ROW_NUMBER() over (partition by entity_static_id order by valid_from desc) rn
from entity_dynamic
)
,step2 as
(
select
*
,dense_rank() over (partition by entity_static_id order by unique_group) f
,unique_group+rn tar
from step1
)
select
min(id) id
,entity_static_id
,val1
,val2
,min(valid_from) valid_from
,max(valid_to) valid_to
from step2
group by entity_static_id,val1
,val2
,tar
order by entity_static_id,valid_from
结果是
+------------------------------------------------------------------------+
|id|entity_static_id|val1|val2|valid_from |valid_to |
+------------------------------------------------------------------------+
|1 |1 |50 |75 | 2018-01-01 00:00:00 | 2018-01-01 03:59:59|
|10|1 |60 |75 | 2018-01-01 04:00:00 | 2018-01-01 05:59:59|
|5 |2 |60 |75 | 2018-01-01 00:00:00 | 2018-01-01 03:59:59|
|11|2 |70 |75 | 2018-01-01 04:00:00 | 2018-01-01 05:59:59|
|13|2 |60 |75 | 2018-01-01 06:00:00 | 2018-01-01 06:59:59|
+------------------------------------------------------------------------+
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句