SQL 服务器 - 按固定值对可能性组合进行分组

SQL Server - Grouping Combination of possibilities by fixed value

我必须创建包含固定物品的最便宜的购物篮。

例如,篮子里有 (5) 件商品

1 和 4 = (1 * 50) + (1 * 100) = 150

2 和 3 = (1 * 60) + (1 * 80) = 140 -- 这是我的家伙

2 和 2 和 1 = (1 * 60) + (1 * 60) + (1 * 50) = 170

3 和 3 = (1 * 80) + (1 * 80) = 160 **** 这 6 项但总项可以超过最小项。重要的是总成本... ....

这也适用于篮子中可能包含的任意数量的物品。还有很多商店,每个商店都有不同的包装,可能包括几件商品。

SQL如何处理这个问题?

更新

这是示例数据生成代码。递归 CTE 解决方案更昂贵。我每次应该在 600-700 家商店的 500-600 毫秒内完成这项工作。这是一个包搜索引擎。使用“#temp”tables 或“UNUION”手动创建场景比递归 CTE 便宜 15-20 倍。

连接 ItemPackageId 也非常昂贵。在选择最便宜的包裹并加入源后,我可以找到所需的包裹 ID 或物品 table。

我期待一个神奇的解决方案,它可以超快并获得正确的选项。 每个商店只需要最便宜的篮子。手动场景创建速度非常快,但有时无法正确选择最便宜的篮子。

                CREATE TABLE #storePackages(
                StoreId int not null,
                PackageId int not null,
                ItemType int not null, -- there are tree item type 0 is normal item, 1 is item has discount 2 is free item
                ItemCount int not null,
                ItemPrice decimal(18,8) not null,
                MaxItemQouta int not null, -- in generaly a package can have between 1 and 6 qouata but in rare can up to 20-25
                MaxFullQouta int not null -- sometimes a package can have additional free or discount item qouta. MaxFullQouta will always greater then MaxItemQouta
            )

            declare @totalStores int
            set @totalStores = (SELECT TOP 1 n = number FROM master..[spt_values] WHERE number BETWEEN 200 AND 400 ORDER BY NEWID())

            declare @storeId int;
            declare @packageId  int;
            declare @maxPackageForStore int;
            declare @itemMinPrice decimal(18,8);
            set @storeId = 1;
            set @packageId = 1

            while(@storeId <= @totalStores)
                BEGIN
                    set @maxPackageForStore = (SELECT TOP 1 n = number FROM master..[spt_values] WHERE number BETWEEN 2 AND 6 ORDER BY NEWID())
                    set @itemMinPrice = (SELECT TOP 1 n = number FROM master..[spt_values] WHERE number BETWEEN 40 AND 100 ORDER BY NEWID())
                        BEGIN
                            INSERT INTO #storePackages
                            SELECT DISTINCT 
                             StoreId = @storeId
                            ,PackageId = CAST(@packageId + number AS int) 
                            ,ItemType = 0
                            ,ItemCount = number
                            ,ItemPrice = @itemMinPrice + (10 * (SELECT TOP 1 n = number FROM master..[spt_values] WHERE number BETWEEN pkgNo.number AND pkgNo.number + 2  ORDER BY NEWID()))
                            ,MaxItemQouta = @maxPackageForStore
                            ,MaxFullQouta =  @maxPackageForStore + (CASE WHEN number > 1 AND number < 4 THEN 1 ELSE 0 END)
                            FROM master..[spt_values] pkgNo
                            WHERE number BETWEEN 1 AND @maxPackageForStore
                            UNION ALL
                            SELECT DISTINCT 
                             StoreId = @storeId
                            ,PackageId = CAST(@packageId + number AS int) 
                            ,ItemType = 1
                            ,ItemCount = 1
                            ,ItemPrice = (@itemMinPrice / 2) + (10 * (SELECT TOP 1 n = number FROM master..[spt_values] WHERE number BETWEEN pkgNo.number AND pkgNo.number + 2  ORDER BY NEWID()))
                            ,MaxItemQouta = @maxPackageForStore
                            ,MaxFullQouta =  @maxPackageForStore  + (SELECT TOP 1 n = number FROM master..[spt_values] WHERE number BETWEEN 0 AND 2 ORDER BY NEWID())
                            FROM master..[spt_values] pkgNo
                            WHERE number BETWEEN 2 AND (CASE WHEN @maxPackageForStore > 4 THEN 4 ELSE @maxPackageForStore END)


                        set @packageId = @packageId + @maxPackageForStore;
                        END
                set @storeId =@storeId + 1;
                END

            SELECT * FROM #storePackages
            drop table #storePackages

我的解决方案

首先,我感谢所有试图帮助我的人。然而,所有建议的解决方案都基于 CTE。正如我之前所说,当考虑成百上千的商店时,递归 CTE 会导致性能问题。一次也请求多个包。这意味着,我要求可以包含多个篮子。一件是 5 件,一件是 3 件,一件是 7 件...

最后一个解决方案

首先,我根据项目大小在 table 中生成所有可能的场景...通过这种方式,我可以选择消除不需要的场景。

CREATE TABLE ItemScenarios(
   Item int,
   ScenarioId int,
   CalculatedItem int  --this will be joined with Store Item
)

然后我生成了从 2 项到 25 项的所有可能场景并插入到 ItemScenarios table。场景可以通过使用 WHILE 或递归 CTE 生成一次。这种方式的好处是场景只生成一次。

结果如下。

Item          |   ScenarioId       |     CalculatedItem
--------------------------------------------------------
2                   1                     2
2                   2                     3
2                   3                     1
2                   3                     1
3                   4                     5
3                   5                     4
3                   6                     3
3                   7                     2
3                   7                     2
3                   8                     2
3                   8                     1
3                   9                     1
3                   9                     1
3                   9                     1
....
.....
......
25                  993                   10

通过这种方式,我可以限制场景大小、最大不同商店、最大不同包裹等。

此外,我还可以排除一些在数学上不可能的场景,这些场景比其他场景便宜。例如对于 4 项请求,某些场景

场景一:2+2

场景二:2+1+1

场景三:1+1+1+1

在这些场景中;情景 2 不可能是最便宜的篮子。因为,

如果场景2 < 场景3 --> 场景1会更低场景 2.因为降低成本的东西是2件商品的价格,而**场景1*有双2件商品

另外如果场景2 < 场景1 --> 场景3会更低然后 情景 2

现在,如果我删除像场景 2 这样的场景,我将获得一些性能优势。

现在我可以在商店中选择最便宜的商品价格

DECLARE @requestedItems int;
SET @requestedItems = 5;

CREATE TABLE #JoinedPackageItemWithScenarios(
   StoreId int not null,
   PackageId int not null,
   ItemCount int not null,
   ItemPrice decimal(18,8) 
   ScenarioId int not null,
)
INSERT INTO #JoinedPackageItemWithScenarios
 SELECT
    SPM.StoreId  
   ,SPM.PackageId 
   ,SPM.ItemCount 
   ,SPM.ItemPrice 
   ,SPM.ScenarioId
   FROM (
      SELECT 
            SP.StoreId  
           ,SP.PackageId 
           ,SP.ItemCount 
           ,SP.ItemPrice 
           ,SC.ScenarioId
           ,RowNumber = ROW_NUMBER() OVER (PARTITION BY SP.StoreId,SC.ScenarioId,SP.ItemCount ORDER BY SP.ItemPrice) 
      FROM ItemScenarios SC
      LEFT JOIN StorePackages AS SP ON SP.ItemCount = SC.CalculatedItem
      WHERE SC.Item = @requestedItems
 ) SPM
 WHERE SPM.RowNumber = 1

-- NOW I HAVE CHEAPEST PRICE FOR EACH ITEM, I CAN CREATE BASKET

 CREATE TABLE #selectedScenarios(
   StoreId int not null,
   ScenarioId int not null,
   TotalItem int not null,
   TotalCost decimal(18,8) 
)
 INSERT INTO #selectedScenarios
 SELECT 
      StoreId
     ,ScenarioId
     ,TotalItem 
     ,TotalCost 
  FROM (
     SELECT 
           StoreId
          ,ScenarioId
          --,PackageIds = dbo.GROUP_CONCAT(CAST(PackageId AS nvarchar(20))) -- CONCATENING PackageId decreasing performance here. We can joing seleceted scenarios with #JoinedPackageItemWithScenarios after selection complated.
          ,TotalItem = SUM(ItemCount)
          ,TotalCost = SUM(ItemPrice)
          ,RowNumber = ROW_NUMBER() OVER (PARTITION BY StoreId ORDER BY SUM(ItemPrice))
        FROM #JoinedPackageItemWithScenarios JPS
        GROUP BY StoreId,ScenarioId
        HAVING(SUM(ItemCount) >= @requestedItems)
     ) SLECTED
     WHERE RowNumber = 1

  -- NOW WE CAN POPULATE PackageIds if needed

  SELECT 
      SS.StoreId
     ,SS.ScenarioId
     ,TotalItem = MAX(SS.TotalItem)
     ,TotalCost = MAX(SS.TotalCost)
     ,PackageIds = dbo.GROUP_CONCAT(CAST(JPS.PackageId AS nvarchar(20)))
    FROM #selectedScenarios SS
    JOIN #JoinedPackageItemWithScenarios AS JPS ON JPS.StoreId = SS.StoreId AND JPS.ScenarioId = SS.ScenarioId
    GROUP BY SS.StoreId,SS.ScenarioId

求和

在我的测试中,这种方式至少比递归 CTE 快 10 倍,尤其是当商店和请求的商品数量增加时。它还得到 100% 正确的结果。因为当商店和请求的项目数量增加时,递归 CTE 尝试了数百万个不需要的 JOIN。

假设您想要比较所有可能的商品排列,直到购物篮中的商品总数超过您的购物篮总数,如下所示的内容可以满足您的要求。

DECLARE @N INT = 1;

DECLARE @myTable TABLE (storeID INT DEFAULT(1), packageID INT IDENTITY(1, 1), item INT, cost INT);
INSERT @myTable (item, cost) VALUES (1, 50), (2, 60), (3, 80), (4, 100), (5, 169), (5, 165), (4, 101), (2, 61);

WITH CTE1 AS (
    SELECT item, cost
    FROM (
        SELECT item, cost, ROW_NUMBER() OVER (PARTITION BY item ORDER BY cost) RN
        FROM @myTable) T
    WHERE RN = 1)
, CTE2 AS (
    SELECT CAST('items'+CAST(C1.item AS VARCHAR(10)) AS VARCHAR(4000)) items, C1.cost totalCost, C1.item totalItems
    FROM CTE1 C1
    UNION ALL
    SELECT CAST(C2.items + ' + items' + CAST(C1.item AS VARCHAR(10)) AS VARCHAR(4000)), C1.cost + C2.totalCost, C1.item + C2.totalItems
    FROM CTE2 C2
    CROSS JOIN CTE1 C1
    WHERE C2.totalItems < @N)
SELECT TOP 1 *
FROM CTE2
WHERE totalItems >= @N
ORDER BY totalCost, totalItems DESC;

编辑处理@Matt 提到的问题。

如果需要组合,则需要递归 CTE。防止无限递归是一个挑战。这是一种方法:

with cte as (
      select cast(packageid as nvarchar(4000)) as packs, item, cost
      from t
      union all
      select concat(cte.packs, ',', t.packageid), cte.item + t.item, cte.cost + t.cost
      from cte join
           t
           on cte.item + t.item < 10  -- some "reasonable" stop condition
     )
select top 1 cte.*
from cte
where cte.item >= 5
order by cost desc;

我不是 100% 确定 SQL 服务器会接受加入条件,但这应该可行。

IF OBJECT_ID('tempdb..#TestResults') IS NOT NULL
    BEGIN
        DROP TABLE #TestResults
    END

DECLARE @MinItemCount INT = 5

;WITH cteMaxCostToConsider AS (
    SELECT
       StoreId
       ,CASE
          WHEN (SUM(ItemCount) >= @MinItemCount) AND
          SUM(ItemPrice) < MIN(((@MinItemCount / ItemCount) + IIF((@MinItemCount % ItemCount) > 0, 1,0)) * ItemPrice) THEN SUM(ItemPrice)
          ELSE MIN(((@MinItemCount / ItemCount) + IIF((@MinItemCount % ItemCount) > 0, 1,0)) * ItemPrice)
       END AS MaxCostToConsider
    FROM
       storePackages
    GROUP BY
       StoreId
)

, cteRecursive AS (
    SELECT
      StoreId
      ,'<PackageId>' + CAST(PackageId AS VARCHAR(MAX)) + '</PackageId>' AS PackageIds
      ,ItemCount AS CombinedItemCount
      ,CAST(ItemPrice AS decimal(18,8)) AS CombinedCost
    FROM
       storePackages

    UNION ALL

    SELECT
      r.StoreId
      ,r.PackageIds + '<PackageId>' + CAST(t.PackageId AS VARCHAR(MAX)) + '</PackageId>'
      ,r.CombinedItemCount + t.ItemCount
      ,CAST(r.CombinedCost + t.ItemPrice AS decimal(18,8))
    FROM
       cteRecursive r
       INNER JOIN storePackages t
       ON r.StoreId = t.StoreId
       INNER JOIN cteMaxCostToConsider m
       ON r.StoreId = m.StoreId
      AND r.CombinedCost + t.ItemPrice <= m.MaxCostToConsider
)

, cteCombinedCostRowNum AS (
    SELECT
       StoreId
       ,CAST(PackageIds AS XML) AS PackageIds
       ,CombinedCost
       ,CombinedItemCount
       ,DENSE_RANK() OVER (PARTITION BY StoreId ORDER BY CombinedCost) AS CombinedCostRowNum
       ,ROW_NUMBER() OVER (PARTITION BY StoreId ORDER BY CombinedCost) AS PseudoCartId
    FROM
       cteRecursive
    WHERE
       CombinedItemCount >= @MinItemCount
)

SELECT DISTINCT
    c.StoreId
    ,x.PackageIds
    ,c.CombinedItemCount
    ,c.CombinedCost
INTO #TestResults
FROM
    cteCombinedCostRowNum  c
    CROSS APPLY (
       SELECT( STUFF ( (
          SELECT ',' + PackageId
          FROM
             (SELECT T.N.value('.','VARCHAR(100)') as PackageId FROM c.PackageIds.nodes('PackageId') as T(N)) p
          ORDER BY
             PackageId
          FOR XML PATH(''), TYPE ).value('.','NVARCHAR(MAX)'), 1, 1, '')
    ) as PackageIds
    ) x
WHERE
    CombinedCostRowNum = 1


SELECT *
FROM
    #TestResults

大约需要 1000-2000 毫秒,具体取决于测试数据中必须考虑的组合(例如,有时您的脚本会生成或多或少的数据)。

这个答案无疑看起来比 Gordon 或 ZLK 复杂一点,但它处理了 Ties、重复值、1 个符合标准的包裹和其他一些东西。然而,主要区别实际上是在最后一个查询中,我采用在递归查询期间构建的 XML 将其拆分,然后按顺序重新组合,以便您可以使用 DISTINCT 并获得唯一的配对,例如package 2 + package 3 = 140 & package 3 + package 2 = 140 将是所有查询中的前 2 个结果,因此使用 XML 拆分然后重新组合允许它成为一行。但是假设您还有另一行,例如 (1,5,2,60),它有 2 个项目并且成本为 60,此查询也将 return 该组合。

您可以在答案之间挑选,并使用他们的方法获得组合,使用我的方法获得最终结果等....但要解释我的查询过程。

cteMaxCostToConsider - 这只是一种获取包含递归查询的成本的方法,以便必须考虑更少的记录。它的作用是确定所有包裹一起的成本,或者如果您购买所有相同包裹以满足最小数量的成本。

cteRecursive - 这类似于 ZLK 的答案,也有点像 Gordon 的答案,但它所做的是退出并继续添加项目和项目组合,直到达到 MaxCostToConsider。如果我限制查看商品数量,它可能会错过 7 件商品比 5 件商品便宜的情况,因此通过约束确定的组合成本,它限制了递归并表现得更好。

cteCombinedCostRowNum - 这只是找到最低的组合成本和至少最小的项目数。

最后的查询有点棘手,但交叉应用将递归 cte 中的 XML 字符串构建拆分为不同的行,重新排序这些行,然后再次连接它们,以便反向组合,例如Package 2 & Package 3 reverse Package 3 & Package 2变成同一条记录然后调用distinct.

这比 SELECT top N 更灵活。要查看差异,请将以下测试用例一次添加到您的测试数据 1: (StoreId, PackageId, Item, Cost)

  • (1,5,2,60)
  • (1,6,1,1),(1,7,1,1)
  • (1,8,50,1)

已编辑。 以上将为您提供具有最低综合成本的商店的每种组合。您注意到的错误是由于 cteMaxCostToConsider。我使用的是 SUM(ItemPrice),但有时与之相关的 SUM(ItemCount) 中没有足够的项目,因此无法将其考虑用于 MaxCostToConsider。我修改了案例陈述以更正该问题。

我还进行了修改以使用您提供的数据示例。请注意,您应该将其中的 PackageId 更改为 IDENTITY 列,因为我使用您使用的方法在商店中获取了重复的 PackageIds。

这是你的脚本的修改版本,看看我在说什么:

IF OBJECT_ID('storePackages') IS NOT NULL
    BEGIN
        DROP TABLE storePackages
    END

CREATE TABLE storePackages(
    StoreId int not null,
    PackageId int not null IDENTITY(1,1),
    ItemType int not null, -- there are tree item type 0 is normal item, 1 is item has discount 2 is free item
    ItemCount int not null,
    ItemPrice decimal(18,8) not null,
    MaxItemQouta int not null, -- in generaly a package can have between 1 and 6 qouata but in rare can up to 20-25
    MaxFullQouta int not null -- sometimes a package can have additional free or discount item qouta. MaxFullQouta will always greater then MaxItemQouta
)

declare @totalStores int
set @totalStores = (SELECT TOP 1 n = number FROM master..[spt_values] WHERE number BETWEEN 200 AND 400 ORDER BY NEWID())

declare @storeId int;
declare @packageId  int;
declare @maxPackageForStore int;
declare @itemMinPrice decimal(18,8);
set @storeId = 1;
set @packageId = 1

while(@storeId <= @totalStores)
    BEGIN
        set @maxPackageForStore = (SELECT TOP 1 n = number FROM master..[spt_values] WHERE number BETWEEN 2 AND 6 ORDER BY NEWID())
        set @itemMinPrice = (SELECT TOP 1 n = number FROM master..[spt_values] WHERE number BETWEEN 40 AND 100 ORDER BY NEWID())
            BEGIN
                INSERT INTO storePackages (StoreId, ItemType, ItemCount, ItemPrice, MaxFullQouta, MaxItemQouta)
                SELECT DISTINCT 
                    StoreId = @storeId
                --,PackageId = CAST(@packageId + number AS int) 
                ,ItemType = 0
                ,ItemCount = number
                ,ItemPrice = @itemMinPrice + (10 * (SELECT TOP 1 n = number FROM master..[spt_values] WHERE number BETWEEN pkgNo.number AND pkgNo.number + 2  ORDER BY NEWID()))
                ,MaxItemQouta = @maxPackageForStore
                ,MaxFullQouta =  @maxPackageForStore + (CASE WHEN number > 1 AND number < 4 THEN 1 ELSE 0 END)
                FROM master..[spt_values] pkgNo
                WHERE number BETWEEN 1 AND @maxPackageForStore
                UNION ALL
                SELECT DISTINCT 
                    StoreId = @storeId
                --,PackageId = CAST(@packageId + number AS int) 
                ,ItemType = 1
                ,ItemCount = 1
                ,ItemPrice = (@itemMinPrice / 2) + (10 * (SELECT TOP 1 n = number FROM master..[spt_values] WHERE number BETWEEN pkgNo.number AND pkgNo.number + 2  ORDER BY NEWID()))
                ,MaxItemQouta = @maxPackageForStore
                ,MaxFullQouta =  @maxPackageForStore  + (SELECT TOP 1 n = number FROM master..[spt_values] WHERE number BETWEEN 0 AND 2 ORDER BY NEWID())
                FROM master..[spt_values] pkgNo
                WHERE number BETWEEN 2 AND (CASE WHEN @maxPackageForStore > 4 THEN 4 ELSE @maxPackageForStore END)


            --set @packageId = @packageId + @maxPackageForStore;
            END
    set @storeId =@storeId + 1;
    END

SELECT * FROM storePackages
--drop table #storePackages

没有 PackageIds 简单的 StoreId 和最低的组合成本 - ~200-300MS 取决于数据 接下来,如果您不关心那里有什么包,并且您只希望每个商店有 1 行,您可以执行以下操作:

IF OBJECT_ID('tempdb..#TestResults') IS NOT NULL
    BEGIN
        DROP TABLE #TestResults
    END

DECLARE @MinItemCount INT = 5

;WITH cteMaxCostToConsider AS (
    SELECT
       StoreId
       ,CASE
          WHEN (SUM(ItemCount) >= @MinItemCount) AND
          SUM(ItemPrice) < MIN(((@MinItemCount / ItemCount) + IIF((@MinItemCount % ItemCount) > 0, 1,0)) * ItemPrice) THEN SUM(ItemPrice)
          ELSE MIN(((@MinItemCount / ItemCount) + IIF((@MinItemCount % ItemCount) > 0, 1,0)) * ItemPrice)
       END AS MaxCostToConsider
    FROM
       storePackages
    GROUP BY
       StoreId
)

, cteRecursive AS (
    SELECT
      StoreId
      ,ItemCount AS CombinedItemCount
      ,CAST(ItemPrice AS decimal(18,8)) AS CombinedCost
    FROM
       storePackages

    UNION ALL

    SELECT
      r.StoreId
      ,r.CombinedItemCount + t.ItemCount
      ,CAST(r.CombinedCost + t.ItemPrice AS decimal(18,8))
    FROM
       cteRecursive r
       INNER JOIN storePackages t
       ON r.StoreId = t.StoreId
       INNER JOIN cteMaxCostToConsider m
       ON r.StoreId = m.StoreId
      AND r.CombinedCost + t.ItemPrice <= m.MaxCostToConsider
)

SELECT
    StoreId
    ,MIN(CombinedCost) as CombinedCost
    INTO #TestResults
FROM
    cteRecursive
WHERE
    CombinedItemCount >= @MinItemCount
GROUP BY
    StoreId

SELECT *
FROM
    #TestResults

WITH PackageIds 每个 StoreId 只有 1 条记录 - 根据测试 data/combinations 考虑~600-1300MS 变化很大 或者,如果您仍然想要包裹 ID,但您不关心您选择的组合,并且您只想要 1 条记录,那么您可以这样做:

IF OBJECT_ID('tempdb..#TestResults') IS NOT NULL
    BEGIN
        DROP TABLE #TestResults
    END

DECLARE @MinItemCount INT = 5

;WITH cteMaxCostToConsider AS (
    SELECT
       StoreId
       ,CASE
          WHEN (SUM(ItemCount) >= @MinItemCount) AND
          SUM(ItemPrice) < MIN(((@MinItemCount / ItemCount) + IIF((@MinItemCount % ItemCount) > 0, 1,0)) * ItemPrice) THEN SUM(ItemPrice)
          ELSE MIN(((@MinItemCount / ItemCount) + IIF((@MinItemCount % ItemCount) > 0, 1,0)) * ItemPrice)
       END AS MaxCostToConsider
    FROM
       storePackages
    GROUP BY
       StoreId
)

, cteRecursive AS (
    SELECT
      StoreId
      ,CAST(PackageId AS VARCHAR(MAX)) AS PackageIds
      ,ItemCount AS CombinedItemCount
      ,CAST(ItemPrice AS decimal(18,8)) AS CombinedCost
    FROM
       storePackages

    UNION ALL

    SELECT
      r.StoreId
      ,r.PackageIds + ',' + CAST(t.PackageId AS VARCHAR(MAX))
      ,r.CombinedItemCount + t.ItemCount
      ,CAST(r.CombinedCost + t.ItemPrice AS decimal(18,8))
    FROM
       cteRecursive r
       INNER JOIN storePackages t
       ON r.StoreId = t.StoreId
       INNER JOIN cteMaxCostToConsider m
       ON r.StoreId = m.StoreId
      AND r.CombinedCost + t.ItemPrice <= m.MaxCostToConsider
)

, cteCombinedCostRowNum AS (
    SELECT
       StoreId
       ,PackageIds
       ,CombinedCost
       ,CombinedItemCount
       ,ROW_NUMBER() OVER (PARTITION BY StoreId ORDER BY CombinedCost) AS RowNumber
    FROM
       cteRecursive
    WHERE
       CombinedItemCount >= @MinItemCount
)

SELECT DISTINCT
    c.StoreId
    ,c.PackageIds
    ,c.CombinedItemCount
    ,c.CombinedCost
INTO #TestResults
FROM
    cteCombinedCostRowNum  c
WHERE
    RowNumber = 1


SELECT *
FROM
    #TestResults

请注意,所有基准测试都是在一台使用了 4 年的英特尔 i7-3520M CPU 2.9 GHz、8 GB RAM 和三星 500 GB EVO SSD 笔记本电脑上完成的。因此,如果您 运行 在资源适当的服务器上执行此操作,我希望速度会呈指数级增长。毫无疑问,在 storePackages 上添加索引也会加快答案。

首先我们应该找到所有组合,然后select寻找价值最小的组合

DECLARE @Table as TABLE (StoreId INT, PackageId INT, Item INT, Cost INT)
INSERT INTO @Table VALUES (1,1,1,50),(1,2,2,60),(1,3,3,80),(1,4,4,100)

DECLARE @MinItemCount INT = 5;

WITH cteCombinationTable AS (
    SELECT  cast(PackageId as NVARCHAR(4000)) as Package, Item, Cost
    FROM @Table
    UNION ALL
    SELECT CONCAT(o.Package,',',c.PackageId), c.Item + o.Item, c.Cost + o.Cost FROM @Table as c join cteCombinationTable as o on CONCAT(o.Package,',',c.PackageId) <> Package
    where o.Item < @MinItemCount
)

select top 1 * 
from cteCombinationTable
where item >= @MinItemCount
order by cast(cost as decimal)/@MinItemCount

我的解决方案

首先,我感谢所有试图帮助我的人。然而,所有建议的解决方案都基于 CTE。正如我之前所说,当考虑成百上千的商店时,递归 CTE 会导致性能问题。一次也请求多个包。这意味着,一个请求可以包含多个篮子。一件是 5 件,一件是 3 件,一件是 7 件...

最后一个解决方案

首先,我根据项目大小在 table 中生成所有可能的场景...通过这种方式,我可以选择消除不需要的场景。

CREATE TABLE ItemScenarios(
   Item int,
   ScenarioId int,
   CalculatedItem int  --this will be joined with Store Item
)

然后我生成了从 2 项到 25 项的所有可能场景并插入到 ItemScenarios table。场景可以通过使用 WHILE 或递归 CTE 生成一次。这种方式的好处是场景只生成一次。

结果如下。

Item          |   ScenarioId       |     CalculatedItem
--------------------------------------------------------
2                   1                     2
2                   2                     3
2                   3                     1
2                   3                     1
3                   4                     5
3                   5                     4
3                   6                     3
3                   7                     2
3                   7                     2
3                   8                     2
3                   8                     1
3                   9                     1
3                   9                     1
3                   9                     1
....
.....
......
25                  993                   10

通过这种方式,我可以限制场景大小、最大不同商店、最大不同包裹等。

此外,我还可以排除一些在数学上不可能的场景,这些场景比其他场景便宜。例如对于 4 项请求,某些场景

场景一:2+2

场景二:2+1+1

场景三:1+1+1+1

在这些场景中;情景 2 不可能是最便宜的篮子。因为,

如果场景2 < 场景3 --> 场景1会更低场景 2.因为降低成本的东西是2件商品的价格,而**场景1*有双2件商品

另外如果场景2 < 场景1 --> 场景3会更低然后 情景 2

现在,如果我删除像场景 2 这样的场景,我将获得一些性能优势。

现在我可以在商店中选择最便宜的商品价格

DECLARE @requestedItems int;
SET @requestedItems = 5;

CREATE TABLE #JoinedPackageItemWithScenarios(
   StoreId int not null,
   PackageId int not null,
   ItemCount int not null,
   ItemPrice decimal(18,8) 
   ScenarioId int not null,
)
INSERT INTO #JoinedPackageItemWithScenarios
 SELECT
    SPM.StoreId  
   ,SPM.PackageId 
   ,SPM.ItemCount 
   ,SPM.ItemPrice 
   ,SPM.ScenarioId
   FROM (
      SELECT 
            SP.StoreId  
           ,SP.PackageId 
           ,SP.ItemCount 
           ,SP.ItemPrice 
           ,SC.ScenarioId
           ,RowNumber = ROW_NUMBER() OVER (PARTITION BY SP.StoreId,SC.ScenarioId,SP.ItemCount ORDER BY SP.ItemPrice) 
      FROM ItemScenarios SC
      LEFT JOIN StorePackages AS SP ON SP.ItemCount = SC.CalculatedItem
      WHERE SC.Item = @requestedItems
 ) SPM
 WHERE SPM.RowNumber = 1

-- NOW I HAVE CHEAPEST PRICE FOR EACH ITEM, I CAN CREATE BASKET

 CREATE TABLE #selectedScenarios(
   StoreId int not null,
   ScenarioId int not null,
   TotalItem int not null,
   TotalCost decimal(18,8) 
)
 INSERT INTO #selectedScenarios
 SELECT 
      StoreId
     ,ScenarioId
     ,TotalItem 
     ,TotalCost 
  FROM (
     SELECT 
           StoreId
          ,ScenarioId
          --,PackageIds = dbo.GROUP_CONCAT(CAST(PackageId AS nvarchar(20))) -- CONCATENING PackageId decreasing performance here. We can joing seleceted scenarios with #JoinedPackageItemWithScenarios after selection complated.
          ,TotalItem = SUM(ItemCount)
          ,TotalCost = SUM(ItemPrice)
          ,RowNumber = ROW_NUMBER() OVER (PARTITION BY StoreId ORDER BY SUM(ItemPrice))
        FROM #JoinedPackageItemWithScenarios JPS
        GROUP BY StoreId,ScenarioId
        HAVING(SUM(ItemCount) >= @requestedItems)
     ) SLECTED
     WHERE RowNumber = 1

  -- NOW WE CAN POPULATE PackageIds if needed

  SELECT 
      SS.StoreId
     ,SS.ScenarioId
     ,TotalItem = MAX(SS.TotalItem)
     ,TotalCost = MAX(SS.TotalCost)
     ,PackageIds = dbo.GROUP_CONCAT(CAST(JPS.PackageId AS nvarchar(20)))
    FROM #selectedScenarios SS
    JOIN #JoinedPackageItemWithScenarios AS JPS ON JPS.StoreId = SS.StoreId AND JPS.ScenarioId = SS.ScenarioId
    GROUP BY SS.StoreId,SS.ScenarioId

求和

在我的测试中,这种方式至少比递归 CTE 快 10 倍,尤其是当商店和请求的商品数量增加时。它还得到 100% 正确的结果。因为当商店和请求的项目数量增加时,递归 CTE 尝试了数百万个不需要的 JOIN。