我的示例表方案如下:
+-------------------------------+
| User | Time | Property |
|------|-------------|----------|
| u1 | 2020-02-01 | p1 |
| u1 | 2020-02-02 | p1 |
| u1 | 2020-02-03 | p2 |
| u1 | 2020-02-04 | p2 |
| u2 | 2020-02-01 | p2 |
| u2 | 2020-02-02 | p2 |
| u2 | 2020-02-04 | p1 |
| u2 | 2020-02-05 | p1 |
+-------------------------------+
数据由User
和Time
列排序。
我想添加一列如下
+--------------------------------------------+
| User | Time | Property | New_Column |
|------|-------------|----------|------------|
| u1 | 2020-02-01 | p1 | 1 |
| u1 | 2020-02-02 | p1 | 1 |
| u1 | 2020-02-03 | p2 | 2 |
| u1 | 2020-02-04 | p2 | 2 |
| u2 | 2020-02-01 | p2 | 3 |
| u2 | 2020-02-02 | p2 | 3 |
| u2 | 2020-02-04 | p1 | 4 |
| u2 | 2020-02-05 | p1 | 4 |
+--------------------------------------------+
在中的值New_Column
对于连续的行序列是唯一的,直到User
或Property
更改为止。
编辑1:
我正在尝试建立一种逻辑,以便及时为用户使用先前的属性。
select
*
, lag(property,1) over (partition by User order by Time) as Prev_Property
from table
order by User, Time
接下来我添加一列Row_Number
这给了我
+------------------------------------------------------------+
| User | Time | Property | Prev_Property | Row_Number |
|------|-------------|----------|---------------|------------|
| u1 | 2020-02-01 | p1 | Null | 1 |
| u1 | 2020-02-02 | p1 | p1 | 2 |
| u1 | 2020-02-03 | p2 | p1 | 3 |
| u1 | 2020-02-04 | p2 | p2 | 4 |
| u2 | 2020-02-01 | p2 | Null | 5 |
| u2 | 2020-02-02 | p2 | p2 | 6 |
| u2 | 2020-02-04 | p1 | p2 | 7 |
| u2 | 2020-02-05 | p1 | p1 | 8 |
+------------------------------------------------------------+
我试图建立如下逻辑
CASE
WHEN Prev_Property is not null
THEN Row_Number
WHEN Property <> Prev_Property
THEN Row_Number
WHEN Property = Prev_Property
THEN -- a logic that gives me the Row_Number of the Prev_Property
我仍在尝试在查询中建立最后几个条件。到目前为止,这是我的尝试。
我不需要New_Column
按顺序排列值。我只希望它们是唯一的并且满足分组条件。
以下是BigQuery标准SQL
#standardSQL
SELECT * EXCEPT(flag),
COUNTIF(flag) OVER(ORDER BY User, Time) New_Column
FROM (
SELECT *,
(User, Property) != LAG((User, Property)) OVER(ORDER BY User, Time) flag
FROM `project.dataset.table`
)
您可以使用问题中的虚拟数据来测试,玩游戏,如下例所示
#standardSQL
WITH `project.dataset.table` AS (
SELECT 'u1' User, DATE '2020-02-01' Time, 'p1' Property UNION ALL
SELECT 'u1', '2020-02-02', 'p1' UNION ALL
SELECT 'u1', '2020-02-03', 'p2' UNION ALL
SELECT 'u1', '2020-02-04', 'p2' UNION ALL
SELECT 'u2', '2020-02-01', 'p2' UNION ALL
SELECT 'u2', '2020-02-02', 'p2' UNION ALL
SELECT 'u2', '2020-02-04', 'p1' UNION ALL
SELECT 'u2', '2020-02-05', 'p1'
)
SELECT * EXCEPT(flag),
COUNTIF(flag) OVER(ORDER BY User, Time) New_Column
FROM (
SELECT *,
(User, Property) != LAG((User, Property)) OVER(ORDER BY User, Time) flag
FROM `project.dataset.table`
)
-- ORDER BY User, Time
带输出
Row User Time Property New_Column
1 u1 2020-02-01 p1 0
2 u1 2020-02-02 p1 0
3 u1 2020-02-03 p2 1
4 u1 2020-02-04 p2 1
5 u2 2020-02-01 p2 2
6 u2 2020-02-02 p2 2
7 u2 2020-02-04 p1 3
8 u2 2020-02-05 p1 3
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句