印象中这两个 SQL 查询会给出相同的输出,但结果却截然不同
Under the impression that these two SQL queries would give the same output, yet they have wildly different results
我正在使用 pandasql
。第一个按预期转换值,但第二个 return 甚至不应该存在的东西。但是,我希望它们 return 具有相同的值。在我看来,唯一的区别是,在第一个中, grouping/sum 出现在子查询中,而在第二个中它出现在子查询之外。我错过了什么?感谢您的帮助! (输出在底部)
第一个查询(正确的)
SELECT a.'Name', a.Q1, b.Q2, (a.Q1 + b.Q2) AS Total
FROM
(SELECT c.'Name', SUM(c.'Paid Amount') AS Q1
FROM some_data AS c
WHERE c.'Quarter' = 'Q1'
GROUP BY c.'Name') AS a
JOIN
(SELECT d.'Name', SUM(d.'Paid Amount') AS Q2
FROM some_data AS d
WHERE d.'Quarter' = 'Q2'
GROUP BY d.'Name') AS b
ON a.'Name' = b.'Name'
ORDER BY Total DESC
LIMIT 5;
第二次查询(不好的)
SELECT a.'Name' as Label, SUM(a.'Paid Amount') AS Q1, SUM(b.'Paid Amount') AS Q2, (SUM(a.'Paid Amount') + SUM(b.'Paid Amount')) as Total
FROM
(SELECT c.'Name', c.'Paid Amount'
FROM some_data AS c
WHERE c.'Quarter' = 'Q1') AS a
JOIN
(SELECT c.'Name', c.'Paid Amount'
FROM some_data AS c
WHERE c.'Quarter' = 'Q2') AS b
ON a.'Name' = b.'Name'
GROUP BY Label
ORDER BY Total DESC
LIMIT 5;
我把一些随机数据放在一起来证明这个问题。
第一个查询的输出(预期)
第二次查询的输出(有问题)
这就是我所说的一厢情愿的编码。
我希望您意识到在 加入 之前进行聚合会产生正确的答案。
问题是 JOIN
既可以乘以行数也可以删除行。在您的情况下,问题是一个或两个表都有 name
的多行,这会乘以行数。 SUM()
只是将 JOIN
.
产生的所有值相加
注意:条件聚合是一种更简单的查询编写方式:
SELECT c.Name,
SUM(CASE WHEN c.Quarter = 'Q1' THEN c.PaidAmount END) AS Q1
SUM(CASE WHEN c.Quarter = 'Q2' THEN c.PaidAmount END) AS Q2
FROM some_data AS c
WHERE c.Quarter IN ('Q1', 'Q2')
GROUP BY c.Name
我正在使用 pandasql
。第一个按预期转换值,但第二个 return 甚至不应该存在的东西。但是,我希望它们 return 具有相同的值。在我看来,唯一的区别是,在第一个中, grouping/sum 出现在子查询中,而在第二个中它出现在子查询之外。我错过了什么?感谢您的帮助! (输出在底部)
第一个查询(正确的)
SELECT a.'Name', a.Q1, b.Q2, (a.Q1 + b.Q2) AS Total
FROM
(SELECT c.'Name', SUM(c.'Paid Amount') AS Q1
FROM some_data AS c
WHERE c.'Quarter' = 'Q1'
GROUP BY c.'Name') AS a
JOIN
(SELECT d.'Name', SUM(d.'Paid Amount') AS Q2
FROM some_data AS d
WHERE d.'Quarter' = 'Q2'
GROUP BY d.'Name') AS b
ON a.'Name' = b.'Name'
ORDER BY Total DESC
LIMIT 5;
第二次查询(不好的)
SELECT a.'Name' as Label, SUM(a.'Paid Amount') AS Q1, SUM(b.'Paid Amount') AS Q2, (SUM(a.'Paid Amount') + SUM(b.'Paid Amount')) as Total
FROM
(SELECT c.'Name', c.'Paid Amount'
FROM some_data AS c
WHERE c.'Quarter' = 'Q1') AS a
JOIN
(SELECT c.'Name', c.'Paid Amount'
FROM some_data AS c
WHERE c.'Quarter' = 'Q2') AS b
ON a.'Name' = b.'Name'
GROUP BY Label
ORDER BY Total DESC
LIMIT 5;
我把一些随机数据放在一起来证明这个问题。
第一个查询的输出(预期)
第二次查询的输出(有问题)
这就是我所说的一厢情愿的编码。
我希望您意识到在 加入 之前进行聚合会产生正确的答案。
问题是 JOIN
既可以乘以行数也可以删除行。在您的情况下,问题是一个或两个表都有 name
的多行,这会乘以行数。 SUM()
只是将 JOIN
.
注意:条件聚合是一种更简单的查询编写方式:
SELECT c.Name,
SUM(CASE WHEN c.Quarter = 'Q1' THEN c.PaidAmount END) AS Q1
SUM(CASE WHEN c.Quarter = 'Q2' THEN c.PaidAmount END) AS Q2
FROM some_data AS c
WHERE c.Quarter IN ('Q1', 'Q2')
GROUP BY c.Name