有没有办法自动将 SQL 行转换为列用例?

Is There a Way to Automate the Conversion of SQL Rows to Column Using Case?

我在玩 usa_names dataset on Bigquery 并且为了能够可视化 1910 年到 2020 年之间的前 10 个名字,我必须按年份分组并使用 CASE 为 10 个名字中的每一个创建一个新列.

问题是,我想可视化前 100 名,我想知道是否有一种方法可以使 CASE 自动化,因为我不必为每个名字,以便为他们创建一个列。

我不得不使用以下 SQL 查询代码来首先获取前 10 个名称;

SELECT
  name,
  SUM(number) AS total
FROM
  bigquery-public-data.usa_names.usa_1910_current
WHERE
  year BETWEEN 1910 AND 2020
GROUP BY
  name
ORDER BY
  total DESC
LIMIT
  10

然后使用以下代码将每个名称行转换为列;

SELECT
  year,
  SUM(CASE WHEN name = 'James' THEN number ELSE 0 END) AS James,
  SUM(CASE WHEN name = 'John' THEN number ELSE 0 END) AS John,
  SUM(CASE WHEN name = 'Robert' THEN number ELSE 0 END) AS Robert,
  SUM(CASE WHEN name = 'Michael' THEN number ELSE 0 END) AS Michael,
  SUM(CASE WHEN name = 'William' THEN number ELSE 0 END) AS William,
  SUM(CASE WHEN name = 'Mary' THEN number ELSE 0 END) AS Mary,
  SUM(CASE WHEN name = 'Richard' THEN number ELSE 0 END) AS Richard,
  SUM(CASE WHEN name = 'Joseph' THEN number ELSE 0 END) AS Joseph,
  SUM(CASE WHEN name = 'Charles' THEN number ELSE 0 END) AS Charles,
  SUM(CASE WHEN name = 'Thomas' THEN number ELSE 0 END) AS Thomas
FROM
  bigquery-public-data.usa_names.usa_1910_current
GROUP BY
  year
ORDER BY
  year

我想获得相同的结果,而不必先提取名称并手动将它们输入到 CASE 语句中。

此外,如果有一种方法可以直接可视化数据而无需将名称从行转换为列,则不需要这样做。

谢谢。

您不需要为每个名称创建一列。您的第一个查询就足够了(显然只需要将限制更改为 100)。根据问题标签,我假设您使用的是 Tableau,因此只需选择您想要的可视化效果(比如条形图)并将名称放在一个轴上,将总计放在另一个轴上。

根据您的后续评论,它看起来像这样

SELECT
name,
year,
SUM(number) AS total
From bigquery-public-data.usa_names.usa_1910_current
WHERE name IN
(
SELECT name
FROM
(
SELECT
  name,
  SUM(number) AS total
FROM
  bigquery-public-data.usa_names.usa_1910_current
WHERE
  year BETWEEN 1910 AND 2020
GROUP BY
  name
ORDER BY
  total DESC
LIMIT
  100
))
GROUP BY name, year

您还可以考虑使用 Tableau 中的计算字段确定原始数据以实现所需的可视化效果。

您需要结合 2 种能力:

  1. 行到列:PIVOT clause
  2. 编写脚本以自动执行查找前 10 个名称的查询
declare top_names default ((
select concat("'", string_agg(name, "','"), "'") 
from (
// your query in question
SELECT
  name
FROM
  bigquery-public-data.usa_names.usa_1910_current
WHERE
  year BETWEEN 1910 AND 2020
GROUP BY
  name
ORDER BY
  SUM(number) DESC
LIMIT
  10
)));
select top_names;

输出为:

'James','John','Robert','Michael','William','Mary','David','Richard','Joseph','Charles'

您需要的 PIVOT 查询是:

SELECT * FROM
  (select year, name, sum(number) number
   from bigquery-public-data.usa_names.usa_1910_current
   group by year, name
   )
  PIVOT(SUM(number) FOR name IN ('James','John','Robert','Michael','William','Mary','David','Richard','Joseph','Charles'
))

其输出与您的第二个查询完全相同。

要将两者结合在一起,您需要类似的东西:

execute immediate concat(
  """
  SELECT * FROM
  (select year, name, sum(number) number
   from bigquery-public-data.usa_names.usa_1910_current
   group by year, name
   )
  PIVOT(SUM(number) FOR name IN (
  """,
  top_names,
  "))");