bigquery standard sql = 从字符串中提取数据

bigquery standard sql = extracting data from strings

我希望提取指定字母后的字符串部分,例如从以下字符串中提取:

wc:275,nwc:267,c1.3:2,c12.1:25,c12.10:39,c12.12:21,c12.13:4

我想提取 275(用于 wc)、2(用于 c1.3)和 25(用于 c12.1)。

我尝试了以下但字段重新显示,ridanxietycnt 和 wordcount 只显示 "NULL"。

SELECT substr(CAST((DATE) AS STRING),0,8) as daydate,
count(1) as count,
avg(CAST(REGEXP_REPLACE(V2Tone, r',.*', "")AS FLOAT64)) tone,
avg(CAST(REGEXP_EXTRACT(GCAM, r'c1.3:([-d.]+)')AS FLOAT64)) anew,
sum(CAST(REGEXP_EXTRACT(GCAM, r'c12.1:([-d.]+)')AS FLOAT64)) ridanxietycnt, 
sum(CAST(REGEXP_EXTRACT(GCAM, r'wc:(d+)')AS FLOAT64)) wordcount   
FROM `gdelt-bq.gdeltv2.gkg_partitioned` where _PARTITIONTIME BETWEEN TIMESTAMP('2019-02-02') AND TIMESTAMP('2019-02-02') 
group by daydate

我希望看到每一列的汇总数字。

我想知道问题是否与正则表达式有关?

这应该让你得到 2(重新)、25(ridanxietycnt)、275(字数),按此顺序

SELECT
SAFE_CAST(REGEXP_EXTRACT('wc:275,nwc:267,c1.3:2,c12.1:25,c12.10:39,c12.12:21,c12.13:4', r'c1.3:(\d+)') as FLOAT64) anew,
SAFE_CAST(REGEXP_EXTRACT('wc:275,nwc:267,c1.3:2,c12.1:25,c12.10:39,c12.12:21,c12.13:4', r'c12.1:(\d+)') as FLOAT64) ridanxietycnt, 
SAFE_CAST(REGEXP_EXTRACT('wc:275,nwc:267,c1.3:2,c12.1:25,c12.10:39,c12.12:21,c12.13:4', r'wc:(\d+)') as FLOAT64) wordcount  

您可以将它们视为简单的 key/value 对,然后在其之上进行聚合。类似于:

select 
   substr(CAST((DATE) AS STRING),0,8) as daydate,
   split(x,':')[safe_offset(0)] as key, 
   cast(split(x,':')[safe_offset(1)] as float64) as value
from `gdelt-bq.gdeltv2.gkg_partitioned`, 
unnest(split(GCAM, ',')) as x
where _PARTITIONTIME BETWEEN TIMESTAMP('2019-02-02') AND TIMESTAMP('2019-02-02')

希望对您有所帮助。