有没有办法自动将 SQL 行转换为列用例?
Is There a Way to Automate the Conversion of SQL Rows to Column Using Case?
我在玩 usa_names dataset on Bigquery 并且为了能够可视化 1910 年到 2020 年之间的前 10 个名字,我必须按年份分组并使用 CASE 为 10 个名字中的每一个创建一个新列.
问题是,我想可视化前 100 名,我想知道是否有一种方法可以使 CASE 自动化,因为我不必为每个名字,以便为他们创建一个列。
我不得不使用以下 SQL 查询代码来首先获取前 10 个名称;
SELECT
name,
SUM(number) AS total
FROM
bigquery-public-data.usa_names.usa_1910_current
WHERE
year BETWEEN 1910 AND 2020
GROUP BY
name
ORDER BY
total DESC
LIMIT
10
然后使用以下代码将每个名称行转换为列;
SELECT
year,
SUM(CASE WHEN name = 'James' THEN number ELSE 0 END) AS James,
SUM(CASE WHEN name = 'John' THEN number ELSE 0 END) AS John,
SUM(CASE WHEN name = 'Robert' THEN number ELSE 0 END) AS Robert,
SUM(CASE WHEN name = 'Michael' THEN number ELSE 0 END) AS Michael,
SUM(CASE WHEN name = 'William' THEN number ELSE 0 END) AS William,
SUM(CASE WHEN name = 'Mary' THEN number ELSE 0 END) AS Mary,
SUM(CASE WHEN name = 'Richard' THEN number ELSE 0 END) AS Richard,
SUM(CASE WHEN name = 'Joseph' THEN number ELSE 0 END) AS Joseph,
SUM(CASE WHEN name = 'Charles' THEN number ELSE 0 END) AS Charles,
SUM(CASE WHEN name = 'Thomas' THEN number ELSE 0 END) AS Thomas
FROM
bigquery-public-data.usa_names.usa_1910_current
GROUP BY
year
ORDER BY
year
我想获得相同的结果,而不必先提取名称并手动将它们输入到 CASE 语句中。
此外,如果有一种方法可以直接可视化数据而无需将名称从行转换为列,则不需要这样做。
谢谢。
您不需要为每个名称创建一列。您的第一个查询就足够了(显然只需要将限制更改为 100)。根据问题标签,我假设您使用的是 Tableau,因此只需选择您想要的可视化效果(比如条形图)并将名称放在一个轴上,将总计放在另一个轴上。
根据您的后续评论,它看起来像这样
SELECT
name,
year,
SUM(number) AS total
From bigquery-public-data.usa_names.usa_1910_current
WHERE name IN
(
SELECT name
FROM
(
SELECT
name,
SUM(number) AS total
FROM
bigquery-public-data.usa_names.usa_1910_current
WHERE
year BETWEEN 1910 AND 2020
GROUP BY
name
ORDER BY
total DESC
LIMIT
100
))
GROUP BY name, year
您还可以考虑使用 Tableau 中的计算字段确定原始数据以实现所需的可视化效果。
您需要结合 2 种能力:
- 行到列:
PIVOT clause
- 编写脚本以自动执行查找前 10 个名称的查询
declare top_names default ((
select concat("'", string_agg(name, "','"), "'")
from (
// your query in question
SELECT
name
FROM
bigquery-public-data.usa_names.usa_1910_current
WHERE
year BETWEEN 1910 AND 2020
GROUP BY
name
ORDER BY
SUM(number) DESC
LIMIT
10
)));
select top_names;
输出为:
'James','John','Robert','Michael','William','Mary','David','Richard','Joseph','Charles'
您需要的 PIVOT 查询是:
SELECT * FROM
(select year, name, sum(number) number
from bigquery-public-data.usa_names.usa_1910_current
group by year, name
)
PIVOT(SUM(number) FOR name IN ('James','John','Robert','Michael','William','Mary','David','Richard','Joseph','Charles'
))
其输出与您的第二个查询完全相同。
要将两者结合在一起,您需要类似的东西:
execute immediate concat(
"""
SELECT * FROM
(select year, name, sum(number) number
from bigquery-public-data.usa_names.usa_1910_current
group by year, name
)
PIVOT(SUM(number) FOR name IN (
""",
top_names,
"))");
我在玩 usa_names dataset on Bigquery 并且为了能够可视化 1910 年到 2020 年之间的前 10 个名字,我必须按年份分组并使用 CASE 为 10 个名字中的每一个创建一个新列.
问题是,我想可视化前 100 名,我想知道是否有一种方法可以使 CASE 自动化,因为我不必为每个名字,以便为他们创建一个列。
我不得不使用以下 SQL 查询代码来首先获取前 10 个名称;
SELECT
name,
SUM(number) AS total
FROM
bigquery-public-data.usa_names.usa_1910_current
WHERE
year BETWEEN 1910 AND 2020
GROUP BY
name
ORDER BY
total DESC
LIMIT
10
然后使用以下代码将每个名称行转换为列;
SELECT
year,
SUM(CASE WHEN name = 'James' THEN number ELSE 0 END) AS James,
SUM(CASE WHEN name = 'John' THEN number ELSE 0 END) AS John,
SUM(CASE WHEN name = 'Robert' THEN number ELSE 0 END) AS Robert,
SUM(CASE WHEN name = 'Michael' THEN number ELSE 0 END) AS Michael,
SUM(CASE WHEN name = 'William' THEN number ELSE 0 END) AS William,
SUM(CASE WHEN name = 'Mary' THEN number ELSE 0 END) AS Mary,
SUM(CASE WHEN name = 'Richard' THEN number ELSE 0 END) AS Richard,
SUM(CASE WHEN name = 'Joseph' THEN number ELSE 0 END) AS Joseph,
SUM(CASE WHEN name = 'Charles' THEN number ELSE 0 END) AS Charles,
SUM(CASE WHEN name = 'Thomas' THEN number ELSE 0 END) AS Thomas
FROM
bigquery-public-data.usa_names.usa_1910_current
GROUP BY
year
ORDER BY
year
我想获得相同的结果,而不必先提取名称并手动将它们输入到 CASE 语句中。
此外,如果有一种方法可以直接可视化数据而无需将名称从行转换为列,则不需要这样做。
谢谢。
您不需要为每个名称创建一列。您的第一个查询就足够了(显然只需要将限制更改为 100)。根据问题标签,我假设您使用的是 Tableau,因此只需选择您想要的可视化效果(比如条形图)并将名称放在一个轴上,将总计放在另一个轴上。
根据您的后续评论,它看起来像这样
SELECT
name,
year,
SUM(number) AS total
From bigquery-public-data.usa_names.usa_1910_current
WHERE name IN
(
SELECT name
FROM
(
SELECT
name,
SUM(number) AS total
FROM
bigquery-public-data.usa_names.usa_1910_current
WHERE
year BETWEEN 1910 AND 2020
GROUP BY
name
ORDER BY
total DESC
LIMIT
100
))
GROUP BY name, year
您还可以考虑使用 Tableau 中的计算字段确定原始数据以实现所需的可视化效果。
您需要结合 2 种能力:
- 行到列:
PIVOT clause
- 编写脚本以自动执行查找前 10 个名称的查询
declare top_names default ((
select concat("'", string_agg(name, "','"), "'")
from (
// your query in question
SELECT
name
FROM
bigquery-public-data.usa_names.usa_1910_current
WHERE
year BETWEEN 1910 AND 2020
GROUP BY
name
ORDER BY
SUM(number) DESC
LIMIT
10
)));
select top_names;
输出为:
'James','John','Robert','Michael','William','Mary','David','Richard','Joseph','Charles'
您需要的 PIVOT 查询是:
SELECT * FROM
(select year, name, sum(number) number
from bigquery-public-data.usa_names.usa_1910_current
group by year, name
)
PIVOT(SUM(number) FOR name IN ('James','John','Robert','Michael','William','Mary','David','Richard','Joseph','Charles'
))
其输出与您的第二个查询完全相同。
要将两者结合在一起,您需要类似的东西:
execute immediate concat(
"""
SELECT * FROM
(select year, name, sum(number) number
from bigquery-public-data.usa_names.usa_1910_current
group by year, name
)
PIVOT(SUM(number) FOR name IN (
""",
top_names,
"))");