如何select每组TOP 1记录(分区)

How to select the TOP 1 record per group (Partition)

我有一个名为 tblRoutes 的 table,它包含一个唯一的往返路线列表(f = from;t = to):

| fCity  | fState | tCity  | tState |
|========|========|========|========|
|New York|   NY   | Miami  |   CA   |
|Houston |   TX   |New York|   NY   |
...

然后我有一个名为 tblCarrierRates 的 table,它列出了承运人为某些路线旅行提供的一系列等级和费率:

| fCity  | fState | tCity  | tState | Tier | Rate | CarrID |  CarrName   |
|========|========|========|========|======|======|========|=============|
|New York|   NY   | Miami  |   CA   |   2  | .99|  ABCD  | Abracadabra |
|New York|   NY   | Miami  |   CA   |   1  | .00|  BUMP  | Bumpy Rides |
|Houston |   TX   |New York|   NY   |   2  | .00|  SLOW  |Slow Carriers|
|Houston |   TX   |New York|   NY   |   2  | .01|  ABCD  | Abracadabra |
...

对于 tblRoutes 中列出的每条唯一路线,我正在寻找 tblCarrierRates 提供的 1 "best"。

"the best" 的标准是最低的 Tier,其次是最低的 Rate

结果需要returntblCarrierRates中显示的所有字段,因此基于tblRoutes中显示的2条路线],期望的结果是:

| fCity  | fState | tCity  | tState | Tier | Rate | CarrID |  CarrName   |
|========|========|========|========|======|======|========|=============|
|New York|   NY   | Miami  |   CA   |   1  | .00|  BUMP  | Bumpy Rides |
|Houston |   TX   |New York|   NY   |   2  | .00|  SLOW  |Slow Carriers|

我看到的方法是按升序排序 Tier,然后是 Rate,然后是如何匹配 TOP 1 记录对于 fCity、fState、tCity 和 tState 的每个唯一组合:

SELECT A.fCity, A.fState, A.tCity, A.tState, Q.Tier, Q.Rate, Q.CarrID, Q.CarrName
FROM tblRoutes As A LEFT JOIN 
    (SELECT TOP 1 B.CarrID, B.CarrName, B.fCity, B.fState, B.tCity, B.tState, B.Rate, B.Tier
    FROM tblCarrierRates As B
    ORDER BY tblCarrierRates.Tier ASC, tblCarrierRates.Rate ASC) As Q
ON (A.tState = Q.tState) AND (A.tCity = Q.tCity) AND (A.fState = Q.fState) AND (A.fCity = Q.fCity);

查询没有失败,但您可能猜到了,我编写的子查询 (Q) 仅 return 一条记录,而不是 tblRoutes 中每条路线的 1 条记录,所以最后的结果是:

| fCity  | fState | tCity  | tState | Tier | Rate | CarrID |  CarrName   |
|========|========|========|========|======|======|========|=============|
|New York|   NY   | Miami  |   CA   |   1  | .00|  BUMP  | Bumpy Rides |
|Houston |   TX   |New York|   NY   |      |      |        |             |

...如您所见,休斯顿到纽约没有任何匹配项,因为我的子查询仅 returned 1 个结果而不是每条路线 1 个结果。

我怎样才能达到我想要的结果?

您的内部查询需要按城市和州分组。这将为每个城市州生成 1 个,允许外部连接加入这些字段。

独立调试您的内部查询,直到您看到您希望外部查询工作的结果。首先取出 Top1,以便您可以看到排序和分组工作正常。我会明确地将 ASC DESC 放在您的内部查询中,以便其他人知道您希望顶部的工作方向。

您可以尝试以下查询:-

SELECT fCity, fState, tCity, tState, MIN(Tier), MIN(Rate), CarrID, CarrName
FROM tblCarrierRates
GROUP BY fCity, fState, tCity, tState, CarrID, CarrName;

我相信您正在寻找 Sql 服务器和 Oracle 分析/窗口功能的等效项,例如 ROW_NUMBER() OVER (PARTITION .. ORDER BY),例如like so.

虽然这在 MS Access 中没有直接提供,但我相信可以通过应用相关子查询来模拟 MS Access 中的行编号功能,该子查询计算具有相同 "Partition" 的行数(由连接过滤器定义),其中每一行通过计算同一分区中前面行的数量来排名,这些行是 'below' 排序标准:

SELECT A.fCity, A.fState, A.tCity, A.tState, Q.Tier, Q.Rate, Q.CarrID, Q.CarrName, TheRank
FROM tblRoutes As A LEFT JOIN 
    (
      SELECT B.CarrID, B.CarrName, B.fCity, B.fState, B.tCity, B.tState, B.Rate, B.Tier, 
      (
        SELECT COUNT(*) + 1
        FROM  tblCarrierRates rnk 
        -- Partition Simulation (JOIN)
        WHERE B.fCity = rnk.fCity AND B.fState = rnk.fState 
              AND B.tCity = rnk.tCity AND B.tState = rnk.tState 
              -- ORDER BY Simulation
              AND (rnk.Tier < B.Tier OR 
                 (rnk.Tier = B.Tier AND rnk.Rate < B.Rate))) AS TheRank
      FROM tblCarrierRates As B) As Q
ON (A.tState = Q.tState) AND (A.tCity = Q.tCity) 
    AND (A.fState = Q.fState) AND (A.fCity = Q.fCity)
-- Now, you just want the top rank in each partition.
WHERE TheRank = 1;

请注意性能 - 将为每一行执行子查询。 此外,如果有联系,则将返回两行。

+1 是从行号 1 开始每个分区(因为其分区中前面的行为零)

编辑,去掉评论

SELECT A.fCity, A.fState, A.tCity, A.tState, Q.Tier, Q.Rate, Q.CarrID, Q.CarrName, TheRank
FROM tblRoutes As A LEFT JOIN 
    (
      SELECT B.CarrID, B.CarrName, B.fCity, B.fState, B.tCity, B.tState, B.Rate, B.Tier, 
      (
        SELECT COUNT(*) + 1
        FROM  tblCarrierRates rnk 
        WHERE B.fCity = rnk.fCity AND B.fState = rnk.fState 
              AND B.tCity = rnk.tCity AND B.tState = rnk.tState 
              AND (rnk.Tier < B.Tier OR 
                 (rnk.Tier = B.Tier AND rnk.Rate < B.Rate))) AS TheRank
      FROM tblCarrierRates As B) As Q
ON (A.tState = Q.tState) AND (A.tCity = Q.tCity) 
    AND (A.fState = Q.fState) AND (A.fCity = Q.fCity)
WHERE TheRank = 1