BigQuery - 提取文本和潜台词
BigQuery - extract text and subtext
我在 BigQuery 中有一个关于每天发出的合同数量的 table:
date contract
2014-05-04 {jeans = 5, caps = 12, CDs = 1, Microwaves = 7, other = 6}
2014-05-05 {cups = 7, other = 5}
我需要归档发送了多少未分类的合同(=其他)。到目前为止,我通过下载 CSV 并在 Excel.
中计算出来来做到这一点
我怎样才能像这样进入 BQ a table:
date other_contracts
2014-05-04 6
2014-05-05 5
谢谢!
使用 regexp_extract 并查找数字序列...
SELECT
*,
REGEXP_EXTRACT(contract,r'other = (\d+)') AS Other
FROM (
SELECT
"2014-05-04" AS Date,
"{jeans = 5, caps = 12, CDs = 1, Microwaves = 7, other = 6}" AS contract),
(
SELECT
"2014-05-05" AS Date,
"{cups = 7, other = 5}" AS contract)
如果可以把第一个table中的数据格式改成JSON,那么就可以用https://cloud.google.com/bigquery/query-reference?hl=en#jsonfunctions
更通用的方法。
我认为会有帮助:
SELECT
Date,
INTEGER(REGEXP_EXTRACT(Item, r'(\d+)')) AS Count,
REGEXP_EXTRACT(Item, r'(\w+)') AS Item
FROM (
SELECT Date, SPLIT(contract) as Item
FROM
(SELECT "2014-05-04" AS Date, "{jeans = 5, caps = 12, CDs = 1, Microwaves = 7, other = 6}" AS contract),
(SELECT "2014-05-05" AS Date, "{cups = 7, other = 5}" AS contract)
)
ORDER BY Date, Count DESC
结果是:
Date Count Item
5/4/2014 12 caps
5/4/2014 7 Microwaves
5/4/2014 6 other
5/4/2014 5 jeans
5/4/2014 1 CDs
5/5/2014 7 cups
5/5/2014 5 other
我在 BigQuery 中有一个关于每天发出的合同数量的 table:
date contract
2014-05-04 {jeans = 5, caps = 12, CDs = 1, Microwaves = 7, other = 6}
2014-05-05 {cups = 7, other = 5}
我需要归档发送了多少未分类的合同(=其他)。到目前为止,我通过下载 CSV 并在 Excel.
中计算出来来做到这一点我怎样才能像这样进入 BQ a table:
date other_contracts
2014-05-04 6
2014-05-05 5
谢谢!
使用 regexp_extract 并查找数字序列...
SELECT
*,
REGEXP_EXTRACT(contract,r'other = (\d+)') AS Other
FROM (
SELECT
"2014-05-04" AS Date,
"{jeans = 5, caps = 12, CDs = 1, Microwaves = 7, other = 6}" AS contract),
(
SELECT
"2014-05-05" AS Date,
"{cups = 7, other = 5}" AS contract)
如果可以把第一个table中的数据格式改成JSON,那么就可以用https://cloud.google.com/bigquery/query-reference?hl=en#jsonfunctions
更通用的方法。 我认为会有帮助:
SELECT
Date,
INTEGER(REGEXP_EXTRACT(Item, r'(\d+)')) AS Count,
REGEXP_EXTRACT(Item, r'(\w+)') AS Item
FROM (
SELECT Date, SPLIT(contract) as Item
FROM
(SELECT "2014-05-04" AS Date, "{jeans = 5, caps = 12, CDs = 1, Microwaves = 7, other = 6}" AS contract),
(SELECT "2014-05-05" AS Date, "{cups = 7, other = 5}" AS contract)
)
ORDER BY Date, Count DESC
结果是:
Date Count Item
5/4/2014 12 caps
5/4/2014 7 Microwaves
5/4/2014 6 other
5/4/2014 5 jeans
5/4/2014 1 CDs
5/5/2014 7 cups
5/5/2014 5 other