运行 客户 ID Bigquery 的出现次数
Running count of apperance of customer id Bigquery
这里有类似的问题,但要么我不知道如何转换到我的情况(可能),要么它们不那么相似但阅读接近我想做的事情(BigQuery: How to calculate the running count of distinct visitors for each day and category?)
总之...
我在 bigquery 中有一个订单 table,它有很多列 headers,我需要使用所有这些列,但我将在这里列出其中的一些
orderID、customerID、transactionDate、Revenue
(我需要获取所有字段)
我想计算出 table 中客户 ID 的实例作为一个新列,所以如果我下了 3 个订单,并且我的客户 ID 是 1234,数据中的第一个实例 table 将是新列中的 1,第二个将是 2,第三个将是 3
例如说我的数据是这样的
> OrderID || CustomerID || TransactionDate || Revenue
> 1 || 1 || 01/01/15 || £20
> 2 || 2 || 01/01/15 || £20
> 3 || 3 || 01/01/15 || £20
> 4 || 1 || 01/01/15 || £20
> 5 || 1 || 01/01/15 || £20
> 6 || 2 || 01/01/15 || £20
> 7 || 4 || 01/01/15 || £20
我想运行对其进行查询,在新列中添加说明实例,如果有 CustomerID 记录,那么它会喜欢
> OrderID || CustomerID || TransactionDate || Revenue ||Instance
> 1 || 1 || 01/01/15 || £20 ||1
> 2 || 2 || 01/01/15 || £20 ||1
> 3 || 3 || 01/01/15 || £20 ||1
> 4 || 1 || 01/01/15 || £20 ||2
> 5 || 1 || 01/01/15 || £20 ||3
> 6 || 2 || 01/01/15 || £20 ||2
> 7 || 4 || 01/01/15 || £20 ||1
每当出现一个已经看到的 customerID 时,实例就会递增 1
此外,我还需要 运行 针对不断增长的 table,目前有 160 万行。
希望有人能帮帮我。
干杯
约翰
您应该使用 window 函数,例如 row_number OVER(按 transaction_date 按您的组按字段排序)
Window Functions 正在帮助您:
Window 函数可以对结果集的特定分区或 "window" 进行计算。每个 window 函数都需要一个 OVER 子句来指定分区,语法如下:
OVER (
[PARTITION BY <expr>]
[ORDER BY <expr>]
[ROWS <expr> | RANGE <expr>]
)
PARTITION BY
始终是可选的。 ORDER BY
在某些情况下是可选的,但某些 window 函数,例如 rank()
或 dense_rank()
,需要子句。
JOIN EACH
和 GROUP EACH BY
子句不能用于 window 函数的输出。要在使用 window 函数时生成大型查询结果,您必须使用 PARTITION BY
.
select *,
row_number() over (partition by CustomerID order by TransactionDate) as Instance
from (select 1 as OrderID, 1 as CustomerID, '01/01/15' as TransactionDate,'£20' as Revenue),
(select 2 as OrderID, 2 as CustomerID, '01/01/15' as TransactionDate,'£20' as Revenue),
(select 3 as OrderID, 3 as CustomerID, '01/01/15' as TransactionDate,'£20' as Revenue),
(select 4 as OrderID, 1 as CustomerID, '01/01/15' as TransactionDate,'£20' as Revenue),
(select 5 as OrderID, 1 as CustomerID, '01/01/15' as TransactionDate,'£20' as Revenue),
(select 6 as OrderID, 2 as CustomerID, '01/01/15' as TransactionDate,'£20' as Revenue),
(select 7 as OrderID, 4 as CustomerID, '01/01/15' as TransactionDate,'£20' as Revenue)
order by OrderID
Returns:
+-----+---------+------------+-----------------+---------+----------+---+
| Row | OrderID | CustomerID | TransactionDate | Revenue | Instance | |
+-----+---------+------------+-----------------+---------+----------+---+
| 1 | 1 | 1 | 01/01/15 | £20 | 1 | |
| 2 | 2 | 2 | 01/01/15 | £20 | 1 | |
| 3 | 3 | 3 | 01/01/15 | £20 | 1 | |
| 4 | 4 | 1 | 01/01/15 | £20 | 2 | |
| 5 | 5 | 1 | 01/01/15 | £20 | 3 | |
| 6 | 6 | 2 | 01/01/15 | £20 | 2 | |
| 7 | 7 | 4 | 01/01/15 | £20 | 1 | |
+-----+---------+------------+-----------------+---------+----------+---+
这里有类似的问题,但要么我不知道如何转换到我的情况(可能),要么它们不那么相似但阅读接近我想做的事情(BigQuery: How to calculate the running count of distinct visitors for each day and category?)
总之...
我在 bigquery 中有一个订单 table,它有很多列 headers,我需要使用所有这些列,但我将在这里列出其中的一些
orderID、customerID、transactionDate、Revenue
(我需要获取所有字段)
我想计算出 table 中客户 ID 的实例作为一个新列,所以如果我下了 3 个订单,并且我的客户 ID 是 1234,数据中的第一个实例 table 将是新列中的 1,第二个将是 2,第三个将是 3
例如说我的数据是这样的
> OrderID || CustomerID || TransactionDate || Revenue
> 1 || 1 || 01/01/15 || £20
> 2 || 2 || 01/01/15 || £20
> 3 || 3 || 01/01/15 || £20
> 4 || 1 || 01/01/15 || £20
> 5 || 1 || 01/01/15 || £20
> 6 || 2 || 01/01/15 || £20
> 7 || 4 || 01/01/15 || £20
我想运行对其进行查询,在新列中添加说明实例,如果有 CustomerID 记录,那么它会喜欢
> OrderID || CustomerID || TransactionDate || Revenue ||Instance
> 1 || 1 || 01/01/15 || £20 ||1
> 2 || 2 || 01/01/15 || £20 ||1
> 3 || 3 || 01/01/15 || £20 ||1
> 4 || 1 || 01/01/15 || £20 ||2
> 5 || 1 || 01/01/15 || £20 ||3
> 6 || 2 || 01/01/15 || £20 ||2
> 7 || 4 || 01/01/15 || £20 ||1
每当出现一个已经看到的 customerID 时,实例就会递增 1
此外,我还需要 运行 针对不断增长的 table,目前有 160 万行。
希望有人能帮帮我。
干杯
约翰
您应该使用 window 函数,例如 row_number OVER(按 transaction_date 按您的组按字段排序)
Window Functions 正在帮助您:
Window 函数可以对结果集的特定分区或 "window" 进行计算。每个 window 函数都需要一个 OVER 子句来指定分区,语法如下:
OVER (
[PARTITION BY <expr>]
[ORDER BY <expr>]
[ROWS <expr> | RANGE <expr>]
)
PARTITION BY
始终是可选的。 ORDER BY
在某些情况下是可选的,但某些 window 函数,例如 rank()
或 dense_rank()
,需要子句。
JOIN EACH
和 GROUP EACH BY
子句不能用于 window 函数的输出。要在使用 window 函数时生成大型查询结果,您必须使用 PARTITION BY
.
select *,
row_number() over (partition by CustomerID order by TransactionDate) as Instance
from (select 1 as OrderID, 1 as CustomerID, '01/01/15' as TransactionDate,'£20' as Revenue),
(select 2 as OrderID, 2 as CustomerID, '01/01/15' as TransactionDate,'£20' as Revenue),
(select 3 as OrderID, 3 as CustomerID, '01/01/15' as TransactionDate,'£20' as Revenue),
(select 4 as OrderID, 1 as CustomerID, '01/01/15' as TransactionDate,'£20' as Revenue),
(select 5 as OrderID, 1 as CustomerID, '01/01/15' as TransactionDate,'£20' as Revenue),
(select 6 as OrderID, 2 as CustomerID, '01/01/15' as TransactionDate,'£20' as Revenue),
(select 7 as OrderID, 4 as CustomerID, '01/01/15' as TransactionDate,'£20' as Revenue)
order by OrderID
Returns:
+-----+---------+------------+-----------------+---------+----------+---+
| Row | OrderID | CustomerID | TransactionDate | Revenue | Instance | |
+-----+---------+------------+-----------------+---------+----------+---+
| 1 | 1 | 1 | 01/01/15 | £20 | 1 | |
| 2 | 2 | 2 | 01/01/15 | £20 | 1 | |
| 3 | 3 | 3 | 01/01/15 | £20 | 1 | |
| 4 | 4 | 1 | 01/01/15 | £20 | 2 | |
| 5 | 5 | 1 | 01/01/15 | £20 | 3 | |
| 6 | 6 | 2 | 01/01/15 | £20 | 2 | |
| 7 | 7 | 4 | 01/01/15 | £20 | 1 | |
+-----+---------+------------+-----------------+---------+----------+---+