Select With 语句中为 Null

Select Null in With Statement

我在 SQLWorkbenchJ 中使用 PostgreSQL,但遇到困难。

我有一个 WITH 声明 select 日期取决于它的行号。如果语句找不到行号,我想在日期字段中 select null。这目前不会发生,它只是 selects 记录,其中所有字段都不为空。我假设它与连接有关,但我不确定。

现声明如下。它应该 return 大约 50,000 条记录,但目前 return 不到 2000 条。

WITH FifthEnquiry AS
(
  SELECT emailaddress,
         SentDate,
         ROW_NUMBER() OVER (PARTITION BY emailaddress ORDER BY COUNT(*) DESC) AS rk
  FROM SentEmails
  GROUP BY emailaddress,
           SentDate
),
TenthEnquiry AS
(
  SELECT emailaddress,
         SentDate,
         ROW_NUMBER() OVER (PARTITION BY emailaddress ORDER BY COUNT(*) DESC) AS rk
  FROM SentEmails
  GROUP BY emailaddress,
           SentDate
),
TwentiethEnquiry AS
(
  SELECT emailaddress,
         SentDate,
         ROW_NUMBER() OVER (PARTITION BY emailaddress ORDER BY COUNT(*) DESC) AS rk
  FROM SentEmails
  GROUP BY emailaddress,
           SentDate
)
SELECT FifthEnquiry.emailaddress,
       FifthEnquiry.SentDate AS Fith,
       TenthEnquiry.SentDate AS Tenth,
       TwentiethEnquiry.SentDate AS Twentieth,
FROM FifthEnquiry
  JOIN TenthEnquiry ON FifthEnquiry.emailaddress = TenthEnquiry.emailaddress
  JOIN TwentiethEnquiry ON FifthEnquiry.emailaddress = TwentiethEnquiry.emailaddress
WHERE (FifthEnquiry.rk = 5)
AND   (TenthEnquiry.rk = 10)
AND   (TwentiethEnquiry.rk = 20)

你可以在很大程度上简化。并使用 LEFT JOIN 保留所有在 GROUP BY 之后至少有 5 行的电子邮件地址,即使没有第 10 或第 20 行:

WITH cte AS (
   SELECT emailaddress, SentDate,
          ROW_NUMBER() OVER (PARTITION BY emailaddress
                             ORDER BY COUNT(*) DESC, SentDate) AS rn
   FROM   SentEmails
   GROUP  BY 1,2
   )
SELECT enq05.emailaddress,
       enq05.SentDate AS fifth,
       enq10.SentDate AS tenth,
       enq20.SentDate AS twentieth
FROM        cte AS enq05
LEFT   JOIN cte AS enq10 ON enq10.emailaddress = enq05.emailaddress
                        AND enq10.rn = 10
LEFT   JOIN cte AS enq20 ON enq20.emailaddress = enq05.emailaddress
                        AND enq20.rn = 20
WHERE  enq05.rn = 5;
  • 您不需要单独的 CTE,所有三个都在做同样的事情。 一个CTE就够了,而且速度明显更快。在外部查询中使用不同 table 别名的自连接。

  • 因为我们现在使用LEFT JOIN,所以我们是否在 JOIN 或 WHERE 子句中添加附加条件很重要。 WHERE 子句中的条件有效地强制 Postgres 将连接视为普通 [INNER] JOIN。我相应地将条件移动到 JOIN 子句。详情:

    • Join a count query on a generate_series in postgres and also retrieve Null-values as "0"
  • 使用 rn,而不是 rk 作为列别名。这是 "row number",而不是 "rank"。请注意 row_number() and rank().

  • 之间的重要行为差异
  • SentDate 添加到 ORDER BY 作为 (emailaddress, SentDate) 的决胜局,具有相同的计数以获得 stable 排序顺序。我的方式 SentDate IS NULL 每组排在最后。您可能希望使用 NULLS LAST 进行降序排序(不适用于 COUNT(*),它永远不会为 NULL):

    • PostgreSQL sort by datetime asc, null first?
  • 您需要注意一个更细微的细节:由于 两个不同的原因,如果 SentDate 可以在基础 table 中为 NULL。结果中 tenth 的 NULL 值可能意味着 emailaddress 的不同值少于 10 个,或者它可能意味着 NULL 根据您的排序顺序位于第 10 个位置。