限制每个组的行数从连接（不是 1 行）

Question

鉴于这些 table：

TABLE Stores (
 store_id INT,
 store_name VARCHAR,
 etc
);

TABLE Employees (
 employee_id INT,
 store_id INT,
 employee_name VARCHAR,
 currently_employed BOOLEAN,
 etc
);

我想列出每家商店工作时间最长的 15 名员工（假设工作时间最低的 15 名员工 employee_id），或者如果有 15 名员工 currently_employed='t'，则列出商店的所有员工。我想用连接子句来完成。

我发现很多人这样做的例子只 1 行，通常是最小值或最大值（单一雇佣时间最长的员工），但我基本上想请在联接内组合 ORDER BY 和 LIMIT。可以在此处找到其中一些示例：

Limit results from joined table to one row
MySQL returning 1 image for each product

我还找到了逐家进行此操作的不错示例（我没有，我有大约 5000 家商店）：

Get top n records for each group of grouped results

我还看到您可以使用 TOP 代替 ORDER BY 和 LIMIT，但不能用于 PostgreSQL.

我认为两个 table 之间的连接子句不是唯一的（甚至不一定是最好的方法）做到这一点，如果可以通过内部的不同 store_id 工作的话的员工 table，所以我愿意接受其他方法。之后随时可以加入。

由于我是 SQL 的新手，我想要任何可以帮助我理解工作原理的理论背景或额外解释。

Answer 1

经典的做法是使用 window function，例如 rank:

SELECT employee_name, store_name
FROM   (SELECT employee_name, store_name, 
        RANK() OVER (PARTITION BY store_name ORDER BY employee_id ASC) AS rk
        FROM   employees e
        JOIN   stores s ON e.store_id = s.store_id) t
WHERE  rk <= 15

Answer 2

`row_number()`

获取每组前 n 行的一般解决方案是使用 window 函数 row_number():

SELECT *
FROM  (
   SELECT *, row_number() OVER (PARTITION BY store_id ORDER BY employee_id) AS rn
   FROM   employees
   WHERE  currently_employed
   ) e
JOIN   stores s USING (store_id)
WHERE  rn <= 15
ORDER  BY store_id, e.rn;

PARTITION BY 应该使用 store_id，它保证是唯一的（相对于 store_name）。
首先识别employees中的行，然后加入stores，这样更便宜。
要获得 15 行，请使用 row_number() 而不是 rank()（将是错误的工具）。只要 employee_id 是唯一的，就不会显示差异。

`LATERAL`

Postgres 9.3+ 的替代方案，通常与匹配索引结合使用时表现更好，尤其是 从大 table.

SELECT s.store_name, e.*
FROM   stores s
, LATERAL (
   SELECT *  -- or just needed columns
   FROM   employees
   WHERE  store_id = s.store_id
   AND    currently_employed
   ORDER  BY employee_id
   LIMIT  15
   ) e
-- WHERE ... possibly select only a few stores
ORDER  BY s.store_name, e.store_id, e.employee_id

完美的索引应该是这样的部分多列索引：

CREATE INDEX ON employees (store_id, employee_id) WHERE  currently_employed

详细信息取决于问题中缺少的详细信息。相关范例：

Create unique constraint with null columns

两个版本都排除了没有现有员工的商店。如果你需要的话，有很多方法可以解决这个问题......

限制每个组的行数从连接（不是 1 行）

Limit number of rows per group from join (NOT to 1 row)

sql

postgresql

join

greatest-n-per-group

sql-limit

`row_number()`

`LATERAL`