用于从特定行生成列的大查询语法
Big Query syntax to generate columns from specific rows
我整天都在努力解决这个问题,似乎找不到合适的解决方案。我的数据具有以下结构:
dauid question total
------------------------------
35200004 8 300.0
35200004 9 500.0
35200005 8 400.0
35200005 9 600.0
35200006 8 900.0
35200006 9 1200.0
我想生成以下输出:
dauid total_8 total_9 normalized_total
----------------------------------------------------
35200004 300.0 500.0 (total_8 / total_9)
35200005 400.0 600.0
35200006 900.0 1200.0
我可以在没有归一化总数的情况下生活,但我真的很想找到一种方法,将问题列中的不同组作为每个 dauid 字段的唯一列。我一直在搞乱我发现的各种例子,但似乎无法得到我需要的东西。
编辑:
8和9的值不是固定的,这只是一个例子。每个 dauid 有 450 个对应的问题,table 有几百万行。我需要一种方法能够有选择地查询其中的两行或更多行,然后将每一行构建为一列。
编辑#2:
要添加更多上下文,这是我正在进行的查询类型:
SELECT dauid, question, total, male, female
FROM [canada_census_2011.ontario]
WHERE csdname CONTAINS 'Toronto'
AND (REGEXP_MATCH(question, r'(^9$)') OR REGEXP_MATCH(question, r'(^8$)') )
GROUP BY dauid, question, total, male, female
ORDER BY dauid;
编辑#3:
非常好,感谢 Pentium10,最终查询如下所示:
select dauid,
sum(if(REGEXP_MATCH(question, r'(^8$)'),total,0)) as total_8,
sum(if(REGEXP_MATCH(question, r'(^9$)'),total,0)) as total_9,
sum(if(REGEXP_MATCH(question, r'(^8$)'),total,0))/sum(if(REGEXP_MATCH(question, r'(^9$)'),total,0)) as normalized
FROM [canada_census_2011.ontario]
WHERE csdname CONTAINS 'Toronto'
AND (REGEXP_MATCH(question, r'(^9$)') OR REGEXP_MATCH(question, r'(^8$)') )
GROUP BY dauid
此示例查询生成您想要的内容
select dauid,
sum(if(question=8,total,0)) as total_8,
sum(if(question=9,total,0)) as total_9,
sum(if(question=8,total,0))/sum(if(question=9,total,0)) as normalized
from
(select 35200004 as dauid, 8 as question, 300.0 as total),
(select 35200004 as dauid, 9 as question, 500.0 as total),
(select 35200005 as dauid, 8 as question, 400.0 as total),
(select 35200005 as dauid, 9 as question, 600.0 as total),
(select 35200006 as dauid, 8 as question, 900.0 as total),
(select 35200006 as dauid, 9 as question, 1200.0 as total)
group by dauid
它returns
+---+----------+---------+---------+--------------------+---+
| | dauid | total_8 | total_9 | normalized | |
+---+----------+---------+---------+--------------------+---+
| 1 | 35200004 | 300.0 | 500.0 | 0.6 | |
| 2 | 35200005 | 400.0 | 600.0 | 0.6666666666666666 | |
| 3 | 35200006 | 900.0 | 1200.0 | 0.75 | |
+---+----------+---------+---------+--------------------+---+
我整天都在努力解决这个问题,似乎找不到合适的解决方案。我的数据具有以下结构:
dauid question total
------------------------------
35200004 8 300.0
35200004 9 500.0
35200005 8 400.0
35200005 9 600.0
35200006 8 900.0
35200006 9 1200.0
我想生成以下输出:
dauid total_8 total_9 normalized_total
----------------------------------------------------
35200004 300.0 500.0 (total_8 / total_9)
35200005 400.0 600.0
35200006 900.0 1200.0
我可以在没有归一化总数的情况下生活,但我真的很想找到一种方法,将问题列中的不同组作为每个 dauid 字段的唯一列。我一直在搞乱我发现的各种例子,但似乎无法得到我需要的东西。
编辑:
8和9的值不是固定的,这只是一个例子。每个 dauid 有 450 个对应的问题,table 有几百万行。我需要一种方法能够有选择地查询其中的两行或更多行,然后将每一行构建为一列。
编辑#2:
要添加更多上下文,这是我正在进行的查询类型:
SELECT dauid, question, total, male, female
FROM [canada_census_2011.ontario]
WHERE csdname CONTAINS 'Toronto'
AND (REGEXP_MATCH(question, r'(^9$)') OR REGEXP_MATCH(question, r'(^8$)') )
GROUP BY dauid, question, total, male, female
ORDER BY dauid;
编辑#3: 非常好,感谢 Pentium10,最终查询如下所示:
select dauid,
sum(if(REGEXP_MATCH(question, r'(^8$)'),total,0)) as total_8,
sum(if(REGEXP_MATCH(question, r'(^9$)'),total,0)) as total_9,
sum(if(REGEXP_MATCH(question, r'(^8$)'),total,0))/sum(if(REGEXP_MATCH(question, r'(^9$)'),total,0)) as normalized
FROM [canada_census_2011.ontario]
WHERE csdname CONTAINS 'Toronto'
AND (REGEXP_MATCH(question, r'(^9$)') OR REGEXP_MATCH(question, r'(^8$)') )
GROUP BY dauid
此示例查询生成您想要的内容
select dauid,
sum(if(question=8,total,0)) as total_8,
sum(if(question=9,total,0)) as total_9,
sum(if(question=8,total,0))/sum(if(question=9,total,0)) as normalized
from
(select 35200004 as dauid, 8 as question, 300.0 as total),
(select 35200004 as dauid, 9 as question, 500.0 as total),
(select 35200005 as dauid, 8 as question, 400.0 as total),
(select 35200005 as dauid, 9 as question, 600.0 as total),
(select 35200006 as dauid, 8 as question, 900.0 as total),
(select 35200006 as dauid, 9 as question, 1200.0 as total)
group by dauid
它returns
+---+----------+---------+---------+--------------------+---+
| | dauid | total_8 | total_9 | normalized | |
+---+----------+---------+---------+--------------------+---+
| 1 | 35200004 | 300.0 | 500.0 | 0.6 | |
| 2 | 35200005 | 400.0 | 600.0 | 0.6666666666666666 | |
| 3 | 35200006 | 900.0 | 1200.0 | 0.75 | |
+---+----------+---------+---------+--------------------+---+