根据两个条件连接两个表 "cannot partition on repeated field"
Joining two tables on two criteria "cannot partition on repeated field"
我为此使用 BigQuery。
我有一个从 table 中提取数据的子查询,该 table 具有 account_id、产品、日期和 product_spend 字段。此子查询通过将每个行项目相加来计算每个 'account_id' 每个产品的总生命周期支出。
SELECT account_id,
product,
SUM(product_spend)/1000000 lifetime_product_spend
FROM usage
GROUP BY 1, 2
结果如下所示:
table: lifetime
account_id product lifetime_product_spend
===========================================================
A product1 50
A product2 20
B product2 100
B product3 150
C product3 500
我正在尝试保留这些值并将它们与更大的查询结合起来:
SELECT account_id,
product,
month,
SUM(spend)
FROM data_source
WHERE month >= DATE_ADD(today ,-5,"MONTH")
GROUP BY 1, 2, 3
此查询有一个 table 如下所示:
table: monthly
account_id product month spend
=================================================================
A product1 1 10
A product1 2 20
A product1 3 30
A product2 1 5
A product2 2 15
B product2 2 100
B product3 2 100
B product3 3 50
C product3 1 100
C product3 2 400
我没有使用聚合来计算第二个 table 的生命周期_product_spend。由于数据量巨大,我只能包含最近 6 个月的数据。这就是为什么我要计算在不同 table 中的终生花费并加入他们。
我当前的查询失败:
SELECT d.account_id,
d.product,
d.month,
sum(d.spend),
u.lifetime_product_spend
FROM data_source d
LEFT JOIN (SELECT account_id,
product,
SUM(product_spend)/1000000 lifetime_product_spend
FROM usage
GROUP BY account_id, product) u
ON d.account_id = u.account_id
WHERE d.month >= DATE_ADD(today ,-5,"MONTH")
GROUP BY d.account_id, d.product, d.month, u.lifetime_product_spend
因为它似乎没有像 Lifetime table 那样为每个产品分配生命周期数字。那是因为我只在 account_id 加入。请参阅下面的错误输出。我截断了这个 table 因为它基本上添加了我一生中的输出数量_product_spend (5) 并为每个月、产品和公司添加一个...因为它忽略了 'product' 这些值的赋值:
table: monthly
account_id product month spend lifetime_product_spend
=====================================================================================
A product1 1 10 50
A product1 1 10 20
A product1 1 10 100
A product1 1 10 150
A product1 1 10 500
A product1 2 20 50
A product1 2 20 20
A product1 2 20 100
A product1 2 20 150
A product1 2 20 500
有没有办法让我加入他们两个?我试过在 x = x AND y = y:
上做一个 JOIN
SELECT d.account_id,
d.product,
d.month,
sum(d.spend),
u.lifetime_product_spend
FROM data_source d
LEFT JOIN (SELECT account_id,
product,
SUM(product_spend)/1000000 lifetime_product_spend
FROM usage
GROUP BY account_id, product) u
ON (d.account_id = u.account_id AND d.product = u.product)
WHERE d.month >= DATE_ADD(today ,-5,"MONTH")
GROUP BY d.account_id, d.product, d.month, u.lifetime_product_spend
但它给了我这个错误:“执行失败
错误:无法在重复字段 d.product” 上分区。
我希望我的最终 table 看起来像这样:
table: monthly
account_id product month spend lifetime_product_spend
=====================================================================================
A product1 1 10 50
A product1 2 20 50
A product1 3 30 50
A product2 1 5 20
A product2 2 15 20
B product2 2 100 100
B product3 2 100 150
B product3 3 50 150
C product3 1 100 500
C product3 2 400 500
我想我需要 "FLATTEN" 某处,但我似乎无法将其放在正确的位置。感谢阅读。
将 "Select .... from usage" 写为子查询,并在 data_source table 上应用 INNER JOIN 或 LEFT JOIN。
SELECT d.account_id,
d.product,
d.month,
sum(d.spend),
u.lifetime_product_spend
from data_source d
left join (SELECT account_id,
product,
SUM(product_spend)/1000000 lifetime_product_spend
FROM usage
GROUP BY account_id, product) u
on(d.account_id=u.account_id and d.product=u.product)
WHERE d.month >= DATE_ADD(today ,-5,"MONTH")
GROUP BY d.account_id, d.product, d.month, u.lifetime_product_spend
SELECT d.account_id,
d.product,
d.month,
sum(d.spend),
u.lifetime_product_spend
FROM FLATTEN(data_source, product) d
LEFT JOIN (SELECT account_id,
product,
SUM(product_spend)/1000000 lifetime_product_spend
FROM usage
GROUP BY account_id, product) u
ON (d.account_id = u.account_id AND d.product = u.product)
WHERE d.month >= DATE_ADD(today ,-5,"MONTH")
GROUP BY d.account_id, d.product, d.month, u.lifetime_product_spend
以上使用围绕重复字段展开的原始数据源 d.product。感谢您的评论和帮助。
我为此使用 BigQuery。
我有一个从 table 中提取数据的子查询,该 table 具有 account_id、产品、日期和 product_spend 字段。此子查询通过将每个行项目相加来计算每个 'account_id' 每个产品的总生命周期支出。
SELECT account_id,
product,
SUM(product_spend)/1000000 lifetime_product_spend
FROM usage
GROUP BY 1, 2
结果如下所示:
table: lifetime
account_id product lifetime_product_spend
===========================================================
A product1 50
A product2 20
B product2 100
B product3 150
C product3 500
我正在尝试保留这些值并将它们与更大的查询结合起来:
SELECT account_id,
product,
month,
SUM(spend)
FROM data_source
WHERE month >= DATE_ADD(today ,-5,"MONTH")
GROUP BY 1, 2, 3
此查询有一个 table 如下所示:
table: monthly
account_id product month spend
=================================================================
A product1 1 10
A product1 2 20
A product1 3 30
A product2 1 5
A product2 2 15
B product2 2 100
B product3 2 100
B product3 3 50
C product3 1 100
C product3 2 400
我没有使用聚合来计算第二个 table 的生命周期_product_spend。由于数据量巨大,我只能包含最近 6 个月的数据。这就是为什么我要计算在不同 table 中的终生花费并加入他们。
我当前的查询失败:
SELECT d.account_id,
d.product,
d.month,
sum(d.spend),
u.lifetime_product_spend
FROM data_source d
LEFT JOIN (SELECT account_id,
product,
SUM(product_spend)/1000000 lifetime_product_spend
FROM usage
GROUP BY account_id, product) u
ON d.account_id = u.account_id
WHERE d.month >= DATE_ADD(today ,-5,"MONTH")
GROUP BY d.account_id, d.product, d.month, u.lifetime_product_spend
因为它似乎没有像 Lifetime table 那样为每个产品分配生命周期数字。那是因为我只在 account_id 加入。请参阅下面的错误输出。我截断了这个 table 因为它基本上添加了我一生中的输出数量_product_spend (5) 并为每个月、产品和公司添加一个...因为它忽略了 'product' 这些值的赋值:
table: monthly
account_id product month spend lifetime_product_spend
=====================================================================================
A product1 1 10 50
A product1 1 10 20
A product1 1 10 100
A product1 1 10 150
A product1 1 10 500
A product1 2 20 50
A product1 2 20 20
A product1 2 20 100
A product1 2 20 150
A product1 2 20 500
有没有办法让我加入他们两个?我试过在 x = x AND y = y:
上做一个 JOINSELECT d.account_id,
d.product,
d.month,
sum(d.spend),
u.lifetime_product_spend
FROM data_source d
LEFT JOIN (SELECT account_id,
product,
SUM(product_spend)/1000000 lifetime_product_spend
FROM usage
GROUP BY account_id, product) u
ON (d.account_id = u.account_id AND d.product = u.product)
WHERE d.month >= DATE_ADD(today ,-5,"MONTH")
GROUP BY d.account_id, d.product, d.month, u.lifetime_product_spend
但它给了我这个错误:“执行失败 错误:无法在重复字段 d.product” 上分区。 我希望我的最终 table 看起来像这样:
table: monthly
account_id product month spend lifetime_product_spend
=====================================================================================
A product1 1 10 50
A product1 2 20 50
A product1 3 30 50
A product2 1 5 20
A product2 2 15 20
B product2 2 100 100
B product3 2 100 150
B product3 3 50 150
C product3 1 100 500
C product3 2 400 500
我想我需要 "FLATTEN" 某处,但我似乎无法将其放在正确的位置。感谢阅读。
将 "Select .... from usage" 写为子查询,并在 data_source table 上应用 INNER JOIN 或 LEFT JOIN。
SELECT d.account_id,
d.product,
d.month,
sum(d.spend),
u.lifetime_product_spend
from data_source d
left join (SELECT account_id,
product,
SUM(product_spend)/1000000 lifetime_product_spend
FROM usage
GROUP BY account_id, product) u
on(d.account_id=u.account_id and d.product=u.product)
WHERE d.month >= DATE_ADD(today ,-5,"MONTH")
GROUP BY d.account_id, d.product, d.month, u.lifetime_product_spend
SELECT d.account_id,
d.product,
d.month,
sum(d.spend),
u.lifetime_product_spend
FROM FLATTEN(data_source, product) d
LEFT JOIN (SELECT account_id,
product,
SUM(product_spend)/1000000 lifetime_product_spend
FROM usage
GROUP BY account_id, product) u
ON (d.account_id = u.account_id AND d.product = u.product)
WHERE d.month >= DATE_ADD(today ,-5,"MONTH")
GROUP BY d.account_id, d.product, d.month, u.lifetime_product_spend
以上使用围绕重复字段展开的原始数据源 d.product。感谢您的评论和帮助。