如何从 Google BigQuery 的 GDELT GKG table 获取 TONE?
How to get TONE from GDELT GKG table from Google BigQuery?
SELECT
DATE,
EXTRACT(YEAR FROM DATE) AS year,
FIPS as Country,
LOCATIONS,
AVG(TONE) as Avg_Tone,
AVG(Positive Score) as PositiveS,
AVG(Negative Score) as NegativeS,
COUNT(*),
From `gdelt-bq.gdeltv2.gkg_partitioned`,
`gdelt-bq.extra.sourcesbycountry` country,
Where
DATE(_PARTITIONTIME) BETWEEN TIMESTAMP('2002-01-01') AND TIMESTAMP('2020-12-31')
AND SourceCommonName=country.Domain
AND Location like '%CH%'
GROUP BY Year,Country
ORDER BY Year,Country
密码本link是http://data.gdeltproject.org/documentation/GDELT-Global_Knowledge_Graph_Codebook-V2.1.pdf
V1.5TONE有TONE、Positive Score和Negative Score等。
我想按年计算平均音调。
如何从 Big Query 获取。
需要先对字段进行投切拆分
date
是一个格式为“yyyymmdd....”的值。因此,我建议将值转换为字符串,并将前四个字符视为年份。
没有V1.5TONE
列,但是有V2Tone
列。它由一个字符串和几个用逗号分隔的数字组成。必须先拆分字符串。然后每个分量都需要铸成一个十进制数。
table gdelt-bq.extra.sourcesbycountry
应该将 url 映射到一个国家。它有一个 url 重复的国家!为了至少消除一些重复值,使用内部 select
和分组依据。
要获取按年份、国家/地区分组的值,需要注释掉所有其他维度列。
SELECT
#DATE,
substr(cast(date as string),0,4) AS year,
FIPS as Country,
#LOCATIONS,
AVG(cast(split(V2Tone,",")[safe_offset(0)] as decimal )) as Avg_Tone,
AVG(cast(split(V2Tone,",")[safe_offset(1)] as decimal )) as PositiveS,
AVG(cast(split(V2Tone,",")[safe_offset(2)] as decimal )) as NegativeS,
COUNT(*) as counts,
From `gdelt-bq.gdeltv2.gkg_partitioned`
left join
(SELECT Domain, FIPS
from `gdelt-bq.extra.sourcesbycountry`
group by 1,2) country
on SourceCommonName=country.Domain
Where
DATE(_PARTITIONTIME) BETWEEN DATE('2020-01-01') AND DATE('2020-01-31')
AND Locations like '%CH%'
GROUP BY Year,Country
ORDER BY Year,Country
也在 where
段中:不要混合 DATE 和 TIMESTAMP。
SELECT
DATE,
EXTRACT(YEAR FROM DATE) AS year,
FIPS as Country,
LOCATIONS,
AVG(TONE) as Avg_Tone,
AVG(Positive Score) as PositiveS,
AVG(Negative Score) as NegativeS,
COUNT(*),
From `gdelt-bq.gdeltv2.gkg_partitioned`,
`gdelt-bq.extra.sourcesbycountry` country,
Where
DATE(_PARTITIONTIME) BETWEEN TIMESTAMP('2002-01-01') AND TIMESTAMP('2020-12-31')
AND SourceCommonName=country.Domain
AND Location like '%CH%'
GROUP BY Year,Country
ORDER BY Year,Country
密码本link是http://data.gdeltproject.org/documentation/GDELT-Global_Knowledge_Graph_Codebook-V2.1.pdf V1.5TONE有TONE、Positive Score和Negative Score等。 我想按年计算平均音调。 如何从 Big Query 获取。
需要先对字段进行投切拆分
date
是一个格式为“yyyymmdd....”的值。因此,我建议将值转换为字符串,并将前四个字符视为年份。没有
V1.5TONE
列,但是有V2Tone
列。它由一个字符串和几个用逗号分隔的数字组成。必须先拆分字符串。然后每个分量都需要铸成一个十进制数。table
gdelt-bq.extra.sourcesbycountry
应该将 url 映射到一个国家。它有一个 url 重复的国家!为了至少消除一些重复值,使用内部select
和分组依据。要获取按年份、国家/地区分组的值,需要注释掉所有其他维度列。
SELECT
#DATE,
substr(cast(date as string),0,4) AS year,
FIPS as Country,
#LOCATIONS,
AVG(cast(split(V2Tone,",")[safe_offset(0)] as decimal )) as Avg_Tone,
AVG(cast(split(V2Tone,",")[safe_offset(1)] as decimal )) as PositiveS,
AVG(cast(split(V2Tone,",")[safe_offset(2)] as decimal )) as NegativeS,
COUNT(*) as counts,
From `gdelt-bq.gdeltv2.gkg_partitioned`
left join
(SELECT Domain, FIPS
from `gdelt-bq.extra.sourcesbycountry`
group by 1,2) country
on SourceCommonName=country.Domain
Where
DATE(_PARTITIONTIME) BETWEEN DATE('2020-01-01') AND DATE('2020-01-31')
AND Locations like '%CH%'
GROUP BY Year,Country
ORDER BY Year,Country
也在 where
段中:不要混合 DATE 和 TIMESTAMP。