如何从 Google BigQuery 的 GDELT GKG table 获取 TONE?

How to get TONE from GDELT GKG table from Google BigQuery?

SELECT
DATE,
EXTRACT(YEAR FROM DATE) AS year,
FIPS as Country,
LOCATIONS,
AVG(TONE) as Avg_Tone,
AVG(Positive Score) as PositiveS,
AVG(Negative Score) as NegativeS,
COUNT(*),
From `gdelt-bq.gdeltv2.gkg_partitioned`,
`gdelt-bq.extra.sourcesbycountry` country,

Where
DATE(_PARTITIONTIME) BETWEEN TIMESTAMP('2002-01-01') AND TIMESTAMP('2020-12-31')
AND SourceCommonName=country.Domain
AND Location like '%CH%'
GROUP BY Year,Country
ORDER BY  Year,Country

密码本link是http://data.gdeltproject.org/documentation/GDELT-Global_Knowledge_Graph_Codebook-V2.1.pdf V1.5TONE有TONE、Positive Score和Negative Score等。 我想按年计算平均音调。 如何从 Big Query 获取。

需要先对字段进行投切拆分

  • date 是一个格式为“yyyymmdd....”的值。因此,我建议将值转换为字符串,并将前四个字符视为年份。

  • 没有V1.5TONE列,但是有V2Tone列。它由一个字符串和几个用逗号分隔的数字组成。必须先拆分字符串。然后每个分量都需要铸成一个十进制数。

  • table gdelt-bq.extra.sourcesbycountry 应该将 url 映射到一个国家。它有一个 url 重复的国家!为了至少消除一些重复值,使用内部 select 和分组依据。

  • 要获取按年份、国家/地区分组的值,需要注释掉所有其他维度列。

SELECT
#DATE,
substr(cast(date as string),0,4) AS year,
FIPS as Country,
#LOCATIONS,
AVG(cast(split(V2Tone,",")[safe_offset(0)] as decimal )) as Avg_Tone,
AVG(cast(split(V2Tone,",")[safe_offset(1)] as decimal )) as PositiveS,
AVG(cast(split(V2Tone,",")[safe_offset(2)] as decimal )) as NegativeS,
COUNT(*) as counts,
From `gdelt-bq.gdeltv2.gkg_partitioned`
left join
(SELECT Domain, FIPS 
from `gdelt-bq.extra.sourcesbycountry`
group by 1,2) country
on  SourceCommonName=country.Domain

Where
DATE(_PARTITIONTIME) BETWEEN DATE('2020-01-01') AND DATE('2020-01-31')
AND Locations like '%CH%'
GROUP BY Year,Country
ORDER BY  Year,Country

也在 where 段中:不要混合 DATE 和 TIMESTAMP。