使用 BigQuery 进行多重连接并写入目标 Table
Multiple Joins And Writing to Destination Table with BigQuery
如果我 不 设置目标 table.
,我有以下查询可以正常工作
SELECT soi.customer_id
, p.department
, p.category
, p.subcategory
, p.tier1
, p.tier2
, pc.bucket as categorization
, SUM(soi.price) as demand
, COUNT(1) as cnt
FROM store.sales_item soi
INNER JOIN datamart.product p ON (soi.product_id = p.product_id)
INNER JOIN daily_customer_fact.dcf_product_categorization pc
ON (p.department = pc.department
AND p.category = pc.category
AND p.subcategory = pc.subcategory
AND p.tier1 = pc.tier1
AND p.tier2 = pc.tier2)
WHERE DATE(soi.created_timestamp) < current_date()
GROUP EACH BY 1,2,3,4,5,6,7 LIMIT 10
但是,如果我设置目的地 table,它会失败
Error: Ambiguous field name 'app_version' in JOIN. Please use the table qualifier before field name.
该列存在于 store.sales_item table,但我既不选择也不加入该列。
我以前看过这个错误信息,它指向以下内容:
- 指定目的地时的查询作业 table 将
flattenResults
设置为 false。
store.sales_item
和 datamart.product
table 都包含一个名为 "app_version". 的字段
如果是这样,我建议查看此答案:
以及这个问题报告:https://code.google.com/p/google-bigquery/issues/detail?id=459
在您的情况下,您应该能够通过使用上面链接的答案中的建议 #3 执行以下类似操作来使查询成功。我无法测试它,因为我无权访问您的来源 tables,但它应该接近于将 flattenResults
设置为 false。
SELECT soi_and_p.customer_id
, soi_and_p.department
, soi_and_p.category
, soi_and_p.subcategory
, soi_and_p.tier1
, soi_and_p.tier2
, pc.bucket as categorization
, SUM(soi_and_p.price) as demand
, COUNT(1) as cnt
FROM
(SELECT soi.customer_id AS customer_id
, p.department AS department
, p.subcategory AS subcategory
, p.tier1 AS tier1
, p.tier2 AS tier2
, soi.price AS price
, soi.created_timestamp AS created_timestamp
FROM store.sales_item soi
INNER JOIN datamart.product p ON (soi.product_id = p.product_id)
) as soi_and_p
INNER JOIN daily_customer_fact.dcf_product_categorization pc
ON (soi_and_p.department = pc.department
AND soi_and_p.category = pc.category
AND soi_and_p.subcategory = pc.subcategory
AND soi_and_p.tier1 = pc.tier1
AND soi_and_p.tier2 = pc.tier2)
WHERE DATE(soi_and_p.created_timestamp) < current_date()
GROUP EACH BY 1,2,3,4,5,6,7 LIMIT 10
如果我 不 设置目标 table.
,我有以下查询可以正常工作SELECT soi.customer_id
, p.department
, p.category
, p.subcategory
, p.tier1
, p.tier2
, pc.bucket as categorization
, SUM(soi.price) as demand
, COUNT(1) as cnt
FROM store.sales_item soi
INNER JOIN datamart.product p ON (soi.product_id = p.product_id)
INNER JOIN daily_customer_fact.dcf_product_categorization pc
ON (p.department = pc.department
AND p.category = pc.category
AND p.subcategory = pc.subcategory
AND p.tier1 = pc.tier1
AND p.tier2 = pc.tier2)
WHERE DATE(soi.created_timestamp) < current_date()
GROUP EACH BY 1,2,3,4,5,6,7 LIMIT 10
但是,如果我设置目的地 table,它会失败
Error: Ambiguous field name 'app_version' in JOIN. Please use the table qualifier before field name.
该列存在于 store.sales_item table,但我既不选择也不加入该列。
我以前看过这个错误信息,它指向以下内容:
- 指定目的地时的查询作业 table 将
flattenResults
设置为 false。 store.sales_item
和datamart.product
table 都包含一个名为 "app_version". 的字段
如果是这样,我建议查看此答案:
以及这个问题报告:https://code.google.com/p/google-bigquery/issues/detail?id=459
在您的情况下,您应该能够通过使用上面链接的答案中的建议 #3 执行以下类似操作来使查询成功。我无法测试它,因为我无权访问您的来源 tables,但它应该接近于将 flattenResults
设置为 false。
SELECT soi_and_p.customer_id
, soi_and_p.department
, soi_and_p.category
, soi_and_p.subcategory
, soi_and_p.tier1
, soi_and_p.tier2
, pc.bucket as categorization
, SUM(soi_and_p.price) as demand
, COUNT(1) as cnt
FROM
(SELECT soi.customer_id AS customer_id
, p.department AS department
, p.subcategory AS subcategory
, p.tier1 AS tier1
, p.tier2 AS tier2
, soi.price AS price
, soi.created_timestamp AS created_timestamp
FROM store.sales_item soi
INNER JOIN datamart.product p ON (soi.product_id = p.product_id)
) as soi_and_p
INNER JOIN daily_customer_fact.dcf_product_categorization pc
ON (soi_and_p.department = pc.department
AND soi_and_p.category = pc.category
AND soi_and_p.subcategory = pc.subcategory
AND soi_and_p.tier1 = pc.tier1
AND soi_and_p.tier2 = pc.tier2)
WHERE DATE(soi_and_p.created_timestamp) < current_date()
GROUP EACH BY 1,2,3,4,5,6,7 LIMIT 10