我正在计算行前存在的次数。
表:
|day |time|type|1 |2 |3 |4 |
|Mon |9 |A |a1 |a2 |a3 |a4 |
|Mon |10 |B |b1 |b2 |b3 |b4 |
|Mon |12 |A |c1 |c2 |c3 |c4 |
|Mon |20 |C |d1 |d2 |d3 |d4 |
|Tue |9 |A |e1 |e2 |e3 |e4 |
|Tue |10 |B |f1 |f2 |f3 |f4 |
|Tue |11 |B |g1 |g2 |g3 |g4 |
|Tue |12 |C |h1 |h2 |h3 |h4 |
|Wed |9 |A |i1 |i2 |i3 |i4 |
输出:
|day |time|type|1 |2 |3 |4 |#A |
|Mon |9 |A |a1 |a2 |a3 |a4 |0 |
|Mon |10 |B |b1 |b2 |b3 |b4 |1 |
|Mon |12 |A |c1 |c2 |c3 |c4 |1 |
|Mon |20 |C |d1 |d2 |d3 |d4 |2 |
|Tue |9 |A |e1 |e2 |e3 |e4 |0 |
|Tue |10 |B |f1 |f2 |f3 |f4 |1 |
|Tue |11 |B |g1 |g2 |g3 |g4 |1 |
|Tue |12 |C |h1 |h2 |h3 |h4 |1 |
|Wed |9 |A |i1 |i2 |i3 |i4 |0 |
我尝试执行
select
x.day,x.time,x.type,x.1,x.2,x.3,x.4,count(*) as #A
from
Table as x
left outer join
Table(where type=A) as y
on
y.day = x.day
and
y.time < x.time
group by
x.day,x.time,x.type,x.1,x.2,x.3,x.4
但这不能给我正确的答案,因为#A太低。
该表非常大,有超过30M的行和超过50k的类型A(通过导致内存限制反弹)进行排序,这无助于调试错误...
我还需要将其作为计划的作业运行,我曾考虑使用dataflow或dataprep来提供帮助,但我宁愿仅在BigQuery中解决它。
任何帮助将非常感激。
谢谢!
编辑:我正在尝试调试较小的数据集上的查询。
以下是BigQuery标准SQL
#standardSQL
SELECT *,
COUNTIF(type = 'A') OVER(
PARTITION BY day ORDER BY time ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
) Count_of_A
FROM `project.dataset.table`
结果为
Row day time type col1 col2 col3 col4 Count_of_A
1 Mon 9 A a1 a2 a3 a4 0
2 Mon 10 B b1 b2 b3 b4 1
3 Mon 12 A c1 c2 c3 c4 1
4 Mon 20 C d1 d2 d3 d4 2
5 Tue 9 A e1 e2 e3 e4 0
6 Tue 10 B f1 f2 f3 f4 1
7 Tue 11 B g1 g2 g3 g4 1
8 Tue 12 C h1 h2 h3 h4 1
9 Wed 9 A i1 i2 i3 i4 0
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句