SQL 服务器 - 按固定值对可能性组合进行分组
SQL Server - Grouping Combination of possibilities by fixed value
我必须创建包含固定物品的最便宜的购物篮。
例如,篮子里有 (5) 件商品
1 和 4 = (1 * 50) + (1 * 100) = 150
2 和 3 = (1 * 60) + (1 * 80) = 140 -- 这是我的家伙
2 和 2 和 1 = (1 * 60) + (1 * 60) + (1 * 50) = 170
3 和 3 = (1 * 80) + (1 * 80) = 160 **** 这 6 项但总项可以超过最小项。重要的是总成本...
....
这也适用于篮子中可能包含的任意数量的物品。还有很多商店,每个商店都有不同的包装,可能包括几件商品。
SQL如何处理这个问题?
更新
这是示例数据生成代码。递归 CTE 解决方案更昂贵。我每次应该在 600-700 家商店的 500-600 毫秒内完成这项工作。这是一个包搜索引擎。使用“#temp”tables 或“UNUION”手动创建场景比递归 CTE 便宜 15-20 倍。
连接 Item
或 PackageId
也非常昂贵。在选择最便宜的包裹并加入源后,我可以找到所需的包裹 ID 或物品 table。
我期待一个神奇的解决方案,它可以超快并获得正确的选项。
每个商店只需要最便宜的篮子。手动场景创建速度非常快,但有时无法正确选择最便宜的篮子。
CREATE TABLE #storePackages(
StoreId int not null,
PackageId int not null,
ItemType int not null, -- there are tree item type 0 is normal item, 1 is item has discount 2 is free item
ItemCount int not null,
ItemPrice decimal(18,8) not null,
MaxItemQouta int not null, -- in generaly a package can have between 1 and 6 qouata but in rare can up to 20-25
MaxFullQouta int not null -- sometimes a package can have additional free or discount item qouta. MaxFullQouta will always greater then MaxItemQouta
)
declare @totalStores int
set @totalStores = (SELECT TOP 1 n = number FROM master..[spt_values] WHERE number BETWEEN 200 AND 400 ORDER BY NEWID())
declare @storeId int;
declare @packageId int;
declare @maxPackageForStore int;
declare @itemMinPrice decimal(18,8);
set @storeId = 1;
set @packageId = 1
while(@storeId <= @totalStores)
BEGIN
set @maxPackageForStore = (SELECT TOP 1 n = number FROM master..[spt_values] WHERE number BETWEEN 2 AND 6 ORDER BY NEWID())
set @itemMinPrice = (SELECT TOP 1 n = number FROM master..[spt_values] WHERE number BETWEEN 40 AND 100 ORDER BY NEWID())
BEGIN
INSERT INTO #storePackages
SELECT DISTINCT
StoreId = @storeId
,PackageId = CAST(@packageId + number AS int)
,ItemType = 0
,ItemCount = number
,ItemPrice = @itemMinPrice + (10 * (SELECT TOP 1 n = number FROM master..[spt_values] WHERE number BETWEEN pkgNo.number AND pkgNo.number + 2 ORDER BY NEWID()))
,MaxItemQouta = @maxPackageForStore
,MaxFullQouta = @maxPackageForStore + (CASE WHEN number > 1 AND number < 4 THEN 1 ELSE 0 END)
FROM master..[spt_values] pkgNo
WHERE number BETWEEN 1 AND @maxPackageForStore
UNION ALL
SELECT DISTINCT
StoreId = @storeId
,PackageId = CAST(@packageId + number AS int)
,ItemType = 1
,ItemCount = 1
,ItemPrice = (@itemMinPrice / 2) + (10 * (SELECT TOP 1 n = number FROM master..[spt_values] WHERE number BETWEEN pkgNo.number AND pkgNo.number + 2 ORDER BY NEWID()))
,MaxItemQouta = @maxPackageForStore
,MaxFullQouta = @maxPackageForStore + (SELECT TOP 1 n = number FROM master..[spt_values] WHERE number BETWEEN 0 AND 2 ORDER BY NEWID())
FROM master..[spt_values] pkgNo
WHERE number BETWEEN 2 AND (CASE WHEN @maxPackageForStore > 4 THEN 4 ELSE @maxPackageForStore END)
set @packageId = @packageId + @maxPackageForStore;
END
set @storeId =@storeId + 1;
END
SELECT * FROM #storePackages
drop table #storePackages
我的解决方案
首先,我感谢所有试图帮助我的人。然而,所有建议的解决方案都基于 CTE。正如我之前所说,当考虑成百上千的商店时,递归 CTE 会导致性能问题。一次也请求多个包。这意味着,我要求可以包含多个篮子。一件是 5 件,一件是 3 件,一件是 7 件...
最后一个解决方案
首先,我根据项目大小在 table 中生成所有可能的场景...通过这种方式,我可以选择消除不需要的场景。
CREATE TABLE ItemScenarios(
Item int,
ScenarioId int,
CalculatedItem int --this will be joined with Store Item
)
然后我生成了从 2 项到 25 项的所有可能场景并插入到 ItemScenarios
table。场景可以通过使用 WHILE 或递归 CTE 生成一次。这种方式的好处是场景只生成一次。
结果如下。
Item | ScenarioId | CalculatedItem
--------------------------------------------------------
2 1 2
2 2 3
2 3 1
2 3 1
3 4 5
3 5 4
3 6 3
3 7 2
3 7 2
3 8 2
3 8 1
3 9 1
3 9 1
3 9 1
....
.....
......
25 993 10
通过这种方式,我可以限制场景大小、最大不同商店、最大不同包裹等。
此外,我还可以排除一些在数学上不可能的场景,这些场景比其他场景便宜。例如对于 4 项请求,某些场景
场景一:2+2
场景二:2+1+1
场景三:1+1+1+1
在这些场景中;情景 2 不可能是最便宜的篮子。因为,
如果场景2 < 场景3 --> 场景1会更低场景 2.因为降低成本的东西是2件商品的价格,而**场景1*有双2件商品
另外如果场景2 < 场景1 --> 场景3会更低然后 情景 2
现在,如果我删除像场景 2 这样的场景,我将获得一些性能优势。
现在我可以在商店中选择最便宜的商品价格
DECLARE @requestedItems int;
SET @requestedItems = 5;
CREATE TABLE #JoinedPackageItemWithScenarios(
StoreId int not null,
PackageId int not null,
ItemCount int not null,
ItemPrice decimal(18,8)
ScenarioId int not null,
)
INSERT INTO #JoinedPackageItemWithScenarios
SELECT
SPM.StoreId
,SPM.PackageId
,SPM.ItemCount
,SPM.ItemPrice
,SPM.ScenarioId
FROM (
SELECT
SP.StoreId
,SP.PackageId
,SP.ItemCount
,SP.ItemPrice
,SC.ScenarioId
,RowNumber = ROW_NUMBER() OVER (PARTITION BY SP.StoreId,SC.ScenarioId,SP.ItemCount ORDER BY SP.ItemPrice)
FROM ItemScenarios SC
LEFT JOIN StorePackages AS SP ON SP.ItemCount = SC.CalculatedItem
WHERE SC.Item = @requestedItems
) SPM
WHERE SPM.RowNumber = 1
-- NOW I HAVE CHEAPEST PRICE FOR EACH ITEM, I CAN CREATE BASKET
CREATE TABLE #selectedScenarios(
StoreId int not null,
ScenarioId int not null,
TotalItem int not null,
TotalCost decimal(18,8)
)
INSERT INTO #selectedScenarios
SELECT
StoreId
,ScenarioId
,TotalItem
,TotalCost
FROM (
SELECT
StoreId
,ScenarioId
--,PackageIds = dbo.GROUP_CONCAT(CAST(PackageId AS nvarchar(20))) -- CONCATENING PackageId decreasing performance here. We can joing seleceted scenarios with #JoinedPackageItemWithScenarios after selection complated.
,TotalItem = SUM(ItemCount)
,TotalCost = SUM(ItemPrice)
,RowNumber = ROW_NUMBER() OVER (PARTITION BY StoreId ORDER BY SUM(ItemPrice))
FROM #JoinedPackageItemWithScenarios JPS
GROUP BY StoreId,ScenarioId
HAVING(SUM(ItemCount) >= @requestedItems)
) SLECTED
WHERE RowNumber = 1
-- NOW WE CAN POPULATE PackageIds if needed
SELECT
SS.StoreId
,SS.ScenarioId
,TotalItem = MAX(SS.TotalItem)
,TotalCost = MAX(SS.TotalCost)
,PackageIds = dbo.GROUP_CONCAT(CAST(JPS.PackageId AS nvarchar(20)))
FROM #selectedScenarios SS
JOIN #JoinedPackageItemWithScenarios AS JPS ON JPS.StoreId = SS.StoreId AND JPS.ScenarioId = SS.ScenarioId
GROUP BY SS.StoreId,SS.ScenarioId
求和
在我的测试中,这种方式至少比递归 CTE 快 10 倍,尤其是当商店和请求的商品数量增加时。它还得到 100% 正确的结果。因为当商店和请求的项目数量增加时,递归 CTE 尝试了数百万个不需要的 JOIN。
假设您想要比较所有可能的商品排列,直到购物篮中的商品总数超过您的购物篮总数,如下所示的内容可以满足您的要求。
DECLARE @N INT = 1;
DECLARE @myTable TABLE (storeID INT DEFAULT(1), packageID INT IDENTITY(1, 1), item INT, cost INT);
INSERT @myTable (item, cost) VALUES (1, 50), (2, 60), (3, 80), (4, 100), (5, 169), (5, 165), (4, 101), (2, 61);
WITH CTE1 AS (
SELECT item, cost
FROM (
SELECT item, cost, ROW_NUMBER() OVER (PARTITION BY item ORDER BY cost) RN
FROM @myTable) T
WHERE RN = 1)
, CTE2 AS (
SELECT CAST('items'+CAST(C1.item AS VARCHAR(10)) AS VARCHAR(4000)) items, C1.cost totalCost, C1.item totalItems
FROM CTE1 C1
UNION ALL
SELECT CAST(C2.items + ' + items' + CAST(C1.item AS VARCHAR(10)) AS VARCHAR(4000)), C1.cost + C2.totalCost, C1.item + C2.totalItems
FROM CTE2 C2
CROSS JOIN CTE1 C1
WHERE C2.totalItems < @N)
SELECT TOP 1 *
FROM CTE2
WHERE totalItems >= @N
ORDER BY totalCost, totalItems DESC;
编辑处理@Matt 提到的问题。
如果需要组合,则需要递归 CTE。防止无限递归是一个挑战。这是一种方法:
with cte as (
select cast(packageid as nvarchar(4000)) as packs, item, cost
from t
union all
select concat(cte.packs, ',', t.packageid), cte.item + t.item, cte.cost + t.cost
from cte join
t
on cte.item + t.item < 10 -- some "reasonable" stop condition
)
select top 1 cte.*
from cte
where cte.item >= 5
order by cost desc;
我不是 100% 确定 SQL 服务器会接受加入条件,但这应该可行。
IF OBJECT_ID('tempdb..#TestResults') IS NOT NULL
BEGIN
DROP TABLE #TestResults
END
DECLARE @MinItemCount INT = 5
;WITH cteMaxCostToConsider AS (
SELECT
StoreId
,CASE
WHEN (SUM(ItemCount) >= @MinItemCount) AND
SUM(ItemPrice) < MIN(((@MinItemCount / ItemCount) + IIF((@MinItemCount % ItemCount) > 0, 1,0)) * ItemPrice) THEN SUM(ItemPrice)
ELSE MIN(((@MinItemCount / ItemCount) + IIF((@MinItemCount % ItemCount) > 0, 1,0)) * ItemPrice)
END AS MaxCostToConsider
FROM
storePackages
GROUP BY
StoreId
)
, cteRecursive AS (
SELECT
StoreId
,'<PackageId>' + CAST(PackageId AS VARCHAR(MAX)) + '</PackageId>' AS PackageIds
,ItemCount AS CombinedItemCount
,CAST(ItemPrice AS decimal(18,8)) AS CombinedCost
FROM
storePackages
UNION ALL
SELECT
r.StoreId
,r.PackageIds + '<PackageId>' + CAST(t.PackageId AS VARCHAR(MAX)) + '</PackageId>'
,r.CombinedItemCount + t.ItemCount
,CAST(r.CombinedCost + t.ItemPrice AS decimal(18,8))
FROM
cteRecursive r
INNER JOIN storePackages t
ON r.StoreId = t.StoreId
INNER JOIN cteMaxCostToConsider m
ON r.StoreId = m.StoreId
AND r.CombinedCost + t.ItemPrice <= m.MaxCostToConsider
)
, cteCombinedCostRowNum AS (
SELECT
StoreId
,CAST(PackageIds AS XML) AS PackageIds
,CombinedCost
,CombinedItemCount
,DENSE_RANK() OVER (PARTITION BY StoreId ORDER BY CombinedCost) AS CombinedCostRowNum
,ROW_NUMBER() OVER (PARTITION BY StoreId ORDER BY CombinedCost) AS PseudoCartId
FROM
cteRecursive
WHERE
CombinedItemCount >= @MinItemCount
)
SELECT DISTINCT
c.StoreId
,x.PackageIds
,c.CombinedItemCount
,c.CombinedCost
INTO #TestResults
FROM
cteCombinedCostRowNum c
CROSS APPLY (
SELECT( STUFF ( (
SELECT ',' + PackageId
FROM
(SELECT T.N.value('.','VARCHAR(100)') as PackageId FROM c.PackageIds.nodes('PackageId') as T(N)) p
ORDER BY
PackageId
FOR XML PATH(''), TYPE ).value('.','NVARCHAR(MAX)'), 1, 1, '')
) as PackageIds
) x
WHERE
CombinedCostRowNum = 1
SELECT *
FROM
#TestResults
大约需要 1000-2000 毫秒,具体取决于测试数据中必须考虑的组合(例如,有时您的脚本会生成或多或少的数据)。
这个答案无疑看起来比 Gordon 或 ZLK 复杂一点,但它处理了 Ties、重复值、1 个符合标准的包裹和其他一些东西。然而,主要区别实际上是在最后一个查询中,我采用在递归查询期间构建的 XML 将其拆分,然后按顺序重新组合,以便您可以使用 DISTINCT 并获得唯一的配对,例如package 2 + package 3 = 140 & package 3 + package 2 = 140 将是所有查询中的前 2 个结果,因此使用 XML 拆分然后重新组合允许它成为一行。但是假设您还有另一行,例如 (1,5,2,60),它有 2 个项目并且成本为 60,此查询也将 return 该组合。
您可以在答案之间挑选,并使用他们的方法获得组合,使用我的方法获得最终结果等....但要解释我的查询过程。
cteMaxCostToConsider
- 这只是一种获取包含递归查询的成本的方法,以便必须考虑更少的记录。它的作用是确定所有包裹一起的成本,或者如果您购买所有相同包裹以满足最小数量的成本。
cteRecursive
- 这类似于 ZLK 的答案,也有点像 Gordon 的答案,但它所做的是退出并继续添加项目和项目组合,直到达到 MaxCostToConsider。如果我限制查看商品数量,它可能会错过 7 件商品比 5 件商品便宜的情况,因此通过约束确定的组合成本,它限制了递归并表现得更好。
cteCombinedCostRowNum
- 这只是找到最低的组合成本和至少最小的项目数。
最后的查询有点棘手,但交叉应用将递归 cte 中的 XML 字符串构建拆分为不同的行,重新排序这些行,然后再次连接它们,以便反向组合,例如Package 2 & Package 3 reverse Package 3 & Package 2变成同一条记录然后调用distinct.
这比 SELECT top N 更灵活。要查看差异,请将以下测试用例一次添加到您的测试数据 1:
(StoreId, PackageId, Item, Cost)
- (1,5,2,60)
- (1,6,1,1),(1,7,1,1)
- (1,8,50,1)
已编辑。 以上将为您提供具有最低综合成本的商店的每种组合。您注意到的错误是由于 cteMaxCostToConsider
。我使用的是 SUM(ItemPrice)
,但有时与之相关的 SUM(ItemCount)
中没有足够的项目,因此无法将其考虑用于 MaxCostToConsider。我修改了案例陈述以更正该问题。
我还进行了修改以使用您提供的数据示例。请注意,您应该将其中的 PackageId 更改为 IDENTITY 列,因为我使用您使用的方法在商店中获取了重复的 PackageIds。
这是你的脚本的修改版本,看看我在说什么:
IF OBJECT_ID('storePackages') IS NOT NULL
BEGIN
DROP TABLE storePackages
END
CREATE TABLE storePackages(
StoreId int not null,
PackageId int not null IDENTITY(1,1),
ItemType int not null, -- there are tree item type 0 is normal item, 1 is item has discount 2 is free item
ItemCount int not null,
ItemPrice decimal(18,8) not null,
MaxItemQouta int not null, -- in generaly a package can have between 1 and 6 qouata but in rare can up to 20-25
MaxFullQouta int not null -- sometimes a package can have additional free or discount item qouta. MaxFullQouta will always greater then MaxItemQouta
)
declare @totalStores int
set @totalStores = (SELECT TOP 1 n = number FROM master..[spt_values] WHERE number BETWEEN 200 AND 400 ORDER BY NEWID())
declare @storeId int;
declare @packageId int;
declare @maxPackageForStore int;
declare @itemMinPrice decimal(18,8);
set @storeId = 1;
set @packageId = 1
while(@storeId <= @totalStores)
BEGIN
set @maxPackageForStore = (SELECT TOP 1 n = number FROM master..[spt_values] WHERE number BETWEEN 2 AND 6 ORDER BY NEWID())
set @itemMinPrice = (SELECT TOP 1 n = number FROM master..[spt_values] WHERE number BETWEEN 40 AND 100 ORDER BY NEWID())
BEGIN
INSERT INTO storePackages (StoreId, ItemType, ItemCount, ItemPrice, MaxFullQouta, MaxItemQouta)
SELECT DISTINCT
StoreId = @storeId
--,PackageId = CAST(@packageId + number AS int)
,ItemType = 0
,ItemCount = number
,ItemPrice = @itemMinPrice + (10 * (SELECT TOP 1 n = number FROM master..[spt_values] WHERE number BETWEEN pkgNo.number AND pkgNo.number + 2 ORDER BY NEWID()))
,MaxItemQouta = @maxPackageForStore
,MaxFullQouta = @maxPackageForStore + (CASE WHEN number > 1 AND number < 4 THEN 1 ELSE 0 END)
FROM master..[spt_values] pkgNo
WHERE number BETWEEN 1 AND @maxPackageForStore
UNION ALL
SELECT DISTINCT
StoreId = @storeId
--,PackageId = CAST(@packageId + number AS int)
,ItemType = 1
,ItemCount = 1
,ItemPrice = (@itemMinPrice / 2) + (10 * (SELECT TOP 1 n = number FROM master..[spt_values] WHERE number BETWEEN pkgNo.number AND pkgNo.number + 2 ORDER BY NEWID()))
,MaxItemQouta = @maxPackageForStore
,MaxFullQouta = @maxPackageForStore + (SELECT TOP 1 n = number FROM master..[spt_values] WHERE number BETWEEN 0 AND 2 ORDER BY NEWID())
FROM master..[spt_values] pkgNo
WHERE number BETWEEN 2 AND (CASE WHEN @maxPackageForStore > 4 THEN 4 ELSE @maxPackageForStore END)
--set @packageId = @packageId + @maxPackageForStore;
END
set @storeId =@storeId + 1;
END
SELECT * FROM storePackages
--drop table #storePackages
没有 PackageIds 简单的 StoreId 和最低的组合成本 - ~200-300MS 取决于数据
接下来,如果您不关心那里有什么包,并且您只希望每个商店有 1 行,您可以执行以下操作:
IF OBJECT_ID('tempdb..#TestResults') IS NOT NULL
BEGIN
DROP TABLE #TestResults
END
DECLARE @MinItemCount INT = 5
;WITH cteMaxCostToConsider AS (
SELECT
StoreId
,CASE
WHEN (SUM(ItemCount) >= @MinItemCount) AND
SUM(ItemPrice) < MIN(((@MinItemCount / ItemCount) + IIF((@MinItemCount % ItemCount) > 0, 1,0)) * ItemPrice) THEN SUM(ItemPrice)
ELSE MIN(((@MinItemCount / ItemCount) + IIF((@MinItemCount % ItemCount) > 0, 1,0)) * ItemPrice)
END AS MaxCostToConsider
FROM
storePackages
GROUP BY
StoreId
)
, cteRecursive AS (
SELECT
StoreId
,ItemCount AS CombinedItemCount
,CAST(ItemPrice AS decimal(18,8)) AS CombinedCost
FROM
storePackages
UNION ALL
SELECT
r.StoreId
,r.CombinedItemCount + t.ItemCount
,CAST(r.CombinedCost + t.ItemPrice AS decimal(18,8))
FROM
cteRecursive r
INNER JOIN storePackages t
ON r.StoreId = t.StoreId
INNER JOIN cteMaxCostToConsider m
ON r.StoreId = m.StoreId
AND r.CombinedCost + t.ItemPrice <= m.MaxCostToConsider
)
SELECT
StoreId
,MIN(CombinedCost) as CombinedCost
INTO #TestResults
FROM
cteRecursive
WHERE
CombinedItemCount >= @MinItemCount
GROUP BY
StoreId
SELECT *
FROM
#TestResults
WITH PackageIds 每个 StoreId 只有 1 条记录 - 根据测试 data/combinations 考虑~600-1300MS 变化很大
或者,如果您仍然想要包裹 ID,但您不关心您选择的组合,并且您只想要 1 条记录,那么您可以这样做:
IF OBJECT_ID('tempdb..#TestResults') IS NOT NULL
BEGIN
DROP TABLE #TestResults
END
DECLARE @MinItemCount INT = 5
;WITH cteMaxCostToConsider AS (
SELECT
StoreId
,CASE
WHEN (SUM(ItemCount) >= @MinItemCount) AND
SUM(ItemPrice) < MIN(((@MinItemCount / ItemCount) + IIF((@MinItemCount % ItemCount) > 0, 1,0)) * ItemPrice) THEN SUM(ItemPrice)
ELSE MIN(((@MinItemCount / ItemCount) + IIF((@MinItemCount % ItemCount) > 0, 1,0)) * ItemPrice)
END AS MaxCostToConsider
FROM
storePackages
GROUP BY
StoreId
)
, cteRecursive AS (
SELECT
StoreId
,CAST(PackageId AS VARCHAR(MAX)) AS PackageIds
,ItemCount AS CombinedItemCount
,CAST(ItemPrice AS decimal(18,8)) AS CombinedCost
FROM
storePackages
UNION ALL
SELECT
r.StoreId
,r.PackageIds + ',' + CAST(t.PackageId AS VARCHAR(MAX))
,r.CombinedItemCount + t.ItemCount
,CAST(r.CombinedCost + t.ItemPrice AS decimal(18,8))
FROM
cteRecursive r
INNER JOIN storePackages t
ON r.StoreId = t.StoreId
INNER JOIN cteMaxCostToConsider m
ON r.StoreId = m.StoreId
AND r.CombinedCost + t.ItemPrice <= m.MaxCostToConsider
)
, cteCombinedCostRowNum AS (
SELECT
StoreId
,PackageIds
,CombinedCost
,CombinedItemCount
,ROW_NUMBER() OVER (PARTITION BY StoreId ORDER BY CombinedCost) AS RowNumber
FROM
cteRecursive
WHERE
CombinedItemCount >= @MinItemCount
)
SELECT DISTINCT
c.StoreId
,c.PackageIds
,c.CombinedItemCount
,c.CombinedCost
INTO #TestResults
FROM
cteCombinedCostRowNum c
WHERE
RowNumber = 1
SELECT *
FROM
#TestResults
请注意,所有基准测试都是在一台使用了 4 年的英特尔 i7-3520M CPU 2.9 GHz、8 GB RAM 和三星 500 GB EVO SSD 笔记本电脑上完成的。因此,如果您 运行 在资源适当的服务器上执行此操作,我希望速度会呈指数级增长。毫无疑问,在 storePackages 上添加索引也会加快答案。
首先我们应该找到所有组合,然后select寻找价值最小的组合
DECLARE @Table as TABLE (StoreId INT, PackageId INT, Item INT, Cost INT)
INSERT INTO @Table VALUES (1,1,1,50),(1,2,2,60),(1,3,3,80),(1,4,4,100)
DECLARE @MinItemCount INT = 5;
WITH cteCombinationTable AS (
SELECT cast(PackageId as NVARCHAR(4000)) as Package, Item, Cost
FROM @Table
UNION ALL
SELECT CONCAT(o.Package,',',c.PackageId), c.Item + o.Item, c.Cost + o.Cost FROM @Table as c join cteCombinationTable as o on CONCAT(o.Package,',',c.PackageId) <> Package
where o.Item < @MinItemCount
)
select top 1 *
from cteCombinationTable
where item >= @MinItemCount
order by cast(cost as decimal)/@MinItemCount
我的解决方案
首先,我感谢所有试图帮助我的人。然而,所有建议的解决方案都基于 CTE。正如我之前所说,当考虑成百上千的商店时,递归 CTE 会导致性能问题。一次也请求多个包。这意味着,一个请求可以包含多个篮子。一件是 5 件,一件是 3 件,一件是 7 件...
最后一个解决方案
首先,我根据项目大小在 table 中生成所有可能的场景...通过这种方式,我可以选择消除不需要的场景。
CREATE TABLE ItemScenarios(
Item int,
ScenarioId int,
CalculatedItem int --this will be joined with Store Item
)
然后我生成了从 2 项到 25 项的所有可能场景并插入到 ItemScenarios
table。场景可以通过使用 WHILE 或递归 CTE 生成一次。这种方式的好处是场景只生成一次。
结果如下。
Item | ScenarioId | CalculatedItem
--------------------------------------------------------
2 1 2
2 2 3
2 3 1
2 3 1
3 4 5
3 5 4
3 6 3
3 7 2
3 7 2
3 8 2
3 8 1
3 9 1
3 9 1
3 9 1
....
.....
......
25 993 10
通过这种方式,我可以限制场景大小、最大不同商店、最大不同包裹等。
此外,我还可以排除一些在数学上不可能的场景,这些场景比其他场景便宜。例如对于 4 项请求,某些场景
场景一:2+2
场景二:2+1+1
场景三:1+1+1+1
在这些场景中;情景 2 不可能是最便宜的篮子。因为,
如果场景2 < 场景3 --> 场景1会更低场景 2.因为降低成本的东西是2件商品的价格,而**场景1*有双2件商品
另外如果场景2 < 场景1 --> 场景3会更低然后 情景 2
现在,如果我删除像场景 2 这样的场景,我将获得一些性能优势。
现在我可以在商店中选择最便宜的商品价格
DECLARE @requestedItems int;
SET @requestedItems = 5;
CREATE TABLE #JoinedPackageItemWithScenarios(
StoreId int not null,
PackageId int not null,
ItemCount int not null,
ItemPrice decimal(18,8)
ScenarioId int not null,
)
INSERT INTO #JoinedPackageItemWithScenarios
SELECT
SPM.StoreId
,SPM.PackageId
,SPM.ItemCount
,SPM.ItemPrice
,SPM.ScenarioId
FROM (
SELECT
SP.StoreId
,SP.PackageId
,SP.ItemCount
,SP.ItemPrice
,SC.ScenarioId
,RowNumber = ROW_NUMBER() OVER (PARTITION BY SP.StoreId,SC.ScenarioId,SP.ItemCount ORDER BY SP.ItemPrice)
FROM ItemScenarios SC
LEFT JOIN StorePackages AS SP ON SP.ItemCount = SC.CalculatedItem
WHERE SC.Item = @requestedItems
) SPM
WHERE SPM.RowNumber = 1
-- NOW I HAVE CHEAPEST PRICE FOR EACH ITEM, I CAN CREATE BASKET
CREATE TABLE #selectedScenarios(
StoreId int not null,
ScenarioId int not null,
TotalItem int not null,
TotalCost decimal(18,8)
)
INSERT INTO #selectedScenarios
SELECT
StoreId
,ScenarioId
,TotalItem
,TotalCost
FROM (
SELECT
StoreId
,ScenarioId
--,PackageIds = dbo.GROUP_CONCAT(CAST(PackageId AS nvarchar(20))) -- CONCATENING PackageId decreasing performance here. We can joing seleceted scenarios with #JoinedPackageItemWithScenarios after selection complated.
,TotalItem = SUM(ItemCount)
,TotalCost = SUM(ItemPrice)
,RowNumber = ROW_NUMBER() OVER (PARTITION BY StoreId ORDER BY SUM(ItemPrice))
FROM #JoinedPackageItemWithScenarios JPS
GROUP BY StoreId,ScenarioId
HAVING(SUM(ItemCount) >= @requestedItems)
) SLECTED
WHERE RowNumber = 1
-- NOW WE CAN POPULATE PackageIds if needed
SELECT
SS.StoreId
,SS.ScenarioId
,TotalItem = MAX(SS.TotalItem)
,TotalCost = MAX(SS.TotalCost)
,PackageIds = dbo.GROUP_CONCAT(CAST(JPS.PackageId AS nvarchar(20)))
FROM #selectedScenarios SS
JOIN #JoinedPackageItemWithScenarios AS JPS ON JPS.StoreId = SS.StoreId AND JPS.ScenarioId = SS.ScenarioId
GROUP BY SS.StoreId,SS.ScenarioId
求和
在我的测试中,这种方式至少比递归 CTE 快 10 倍,尤其是当商店和请求的商品数量增加时。它还得到 100% 正确的结果。因为当商店和请求的项目数量增加时,递归 CTE 尝试了数百万个不需要的 JOIN。
我必须创建包含固定物品的最便宜的购物篮。
例如,篮子里有 (5) 件商品
1 和 4 = (1 * 50) + (1 * 100) = 150
2 和 3 = (1 * 60) + (1 * 80) = 140 -- 这是我的家伙
2 和 2 和 1 = (1 * 60) + (1 * 60) + (1 * 50) = 170
3 和 3 = (1 * 80) + (1 * 80) = 160 **** 这 6 项但总项可以超过最小项。重要的是总成本... ....
这也适用于篮子中可能包含的任意数量的物品。还有很多商店,每个商店都有不同的包装,可能包括几件商品。
SQL如何处理这个问题?
更新
这是示例数据生成代码。递归 CTE 解决方案更昂贵。我每次应该在 600-700 家商店的 500-600 毫秒内完成这项工作。这是一个包搜索引擎。使用“#temp”tables 或“UNUION”手动创建场景比递归 CTE 便宜 15-20 倍。
连接 Item
或 PackageId
也非常昂贵。在选择最便宜的包裹并加入源后,我可以找到所需的包裹 ID 或物品 table。
我期待一个神奇的解决方案,它可以超快并获得正确的选项。 每个商店只需要最便宜的篮子。手动场景创建速度非常快,但有时无法正确选择最便宜的篮子。
CREATE TABLE #storePackages(
StoreId int not null,
PackageId int not null,
ItemType int not null, -- there are tree item type 0 is normal item, 1 is item has discount 2 is free item
ItemCount int not null,
ItemPrice decimal(18,8) not null,
MaxItemQouta int not null, -- in generaly a package can have between 1 and 6 qouata but in rare can up to 20-25
MaxFullQouta int not null -- sometimes a package can have additional free or discount item qouta. MaxFullQouta will always greater then MaxItemQouta
)
declare @totalStores int
set @totalStores = (SELECT TOP 1 n = number FROM master..[spt_values] WHERE number BETWEEN 200 AND 400 ORDER BY NEWID())
declare @storeId int;
declare @packageId int;
declare @maxPackageForStore int;
declare @itemMinPrice decimal(18,8);
set @storeId = 1;
set @packageId = 1
while(@storeId <= @totalStores)
BEGIN
set @maxPackageForStore = (SELECT TOP 1 n = number FROM master..[spt_values] WHERE number BETWEEN 2 AND 6 ORDER BY NEWID())
set @itemMinPrice = (SELECT TOP 1 n = number FROM master..[spt_values] WHERE number BETWEEN 40 AND 100 ORDER BY NEWID())
BEGIN
INSERT INTO #storePackages
SELECT DISTINCT
StoreId = @storeId
,PackageId = CAST(@packageId + number AS int)
,ItemType = 0
,ItemCount = number
,ItemPrice = @itemMinPrice + (10 * (SELECT TOP 1 n = number FROM master..[spt_values] WHERE number BETWEEN pkgNo.number AND pkgNo.number + 2 ORDER BY NEWID()))
,MaxItemQouta = @maxPackageForStore
,MaxFullQouta = @maxPackageForStore + (CASE WHEN number > 1 AND number < 4 THEN 1 ELSE 0 END)
FROM master..[spt_values] pkgNo
WHERE number BETWEEN 1 AND @maxPackageForStore
UNION ALL
SELECT DISTINCT
StoreId = @storeId
,PackageId = CAST(@packageId + number AS int)
,ItemType = 1
,ItemCount = 1
,ItemPrice = (@itemMinPrice / 2) + (10 * (SELECT TOP 1 n = number FROM master..[spt_values] WHERE number BETWEEN pkgNo.number AND pkgNo.number + 2 ORDER BY NEWID()))
,MaxItemQouta = @maxPackageForStore
,MaxFullQouta = @maxPackageForStore + (SELECT TOP 1 n = number FROM master..[spt_values] WHERE number BETWEEN 0 AND 2 ORDER BY NEWID())
FROM master..[spt_values] pkgNo
WHERE number BETWEEN 2 AND (CASE WHEN @maxPackageForStore > 4 THEN 4 ELSE @maxPackageForStore END)
set @packageId = @packageId + @maxPackageForStore;
END
set @storeId =@storeId + 1;
END
SELECT * FROM #storePackages
drop table #storePackages
我的解决方案
首先,我感谢所有试图帮助我的人。然而,所有建议的解决方案都基于 CTE。正如我之前所说,当考虑成百上千的商店时,递归 CTE 会导致性能问题。一次也请求多个包。这意味着,我要求可以包含多个篮子。一件是 5 件,一件是 3 件,一件是 7 件...
最后一个解决方案
首先,我根据项目大小在 table 中生成所有可能的场景...通过这种方式,我可以选择消除不需要的场景。
CREATE TABLE ItemScenarios(
Item int,
ScenarioId int,
CalculatedItem int --this will be joined with Store Item
)
然后我生成了从 2 项到 25 项的所有可能场景并插入到 ItemScenarios
table。场景可以通过使用 WHILE 或递归 CTE 生成一次。这种方式的好处是场景只生成一次。
结果如下。
Item | ScenarioId | CalculatedItem
--------------------------------------------------------
2 1 2
2 2 3
2 3 1
2 3 1
3 4 5
3 5 4
3 6 3
3 7 2
3 7 2
3 8 2
3 8 1
3 9 1
3 9 1
3 9 1
....
.....
......
25 993 10
通过这种方式,我可以限制场景大小、最大不同商店、最大不同包裹等。
此外,我还可以排除一些在数学上不可能的场景,这些场景比其他场景便宜。例如对于 4 项请求,某些场景
场景一:2+2
场景二:2+1+1
场景三:1+1+1+1
在这些场景中;情景 2 不可能是最便宜的篮子。因为,
如果场景2 < 场景3 --> 场景1会更低场景 2.因为降低成本的东西是2件商品的价格,而**场景1*有双2件商品
另外如果场景2 < 场景1 --> 场景3会更低然后 情景 2
现在,如果我删除像场景 2 这样的场景,我将获得一些性能优势。
现在我可以在商店中选择最便宜的商品价格
DECLARE @requestedItems int;
SET @requestedItems = 5;
CREATE TABLE #JoinedPackageItemWithScenarios(
StoreId int not null,
PackageId int not null,
ItemCount int not null,
ItemPrice decimal(18,8)
ScenarioId int not null,
)
INSERT INTO #JoinedPackageItemWithScenarios
SELECT
SPM.StoreId
,SPM.PackageId
,SPM.ItemCount
,SPM.ItemPrice
,SPM.ScenarioId
FROM (
SELECT
SP.StoreId
,SP.PackageId
,SP.ItemCount
,SP.ItemPrice
,SC.ScenarioId
,RowNumber = ROW_NUMBER() OVER (PARTITION BY SP.StoreId,SC.ScenarioId,SP.ItemCount ORDER BY SP.ItemPrice)
FROM ItemScenarios SC
LEFT JOIN StorePackages AS SP ON SP.ItemCount = SC.CalculatedItem
WHERE SC.Item = @requestedItems
) SPM
WHERE SPM.RowNumber = 1
-- NOW I HAVE CHEAPEST PRICE FOR EACH ITEM, I CAN CREATE BASKET
CREATE TABLE #selectedScenarios(
StoreId int not null,
ScenarioId int not null,
TotalItem int not null,
TotalCost decimal(18,8)
)
INSERT INTO #selectedScenarios
SELECT
StoreId
,ScenarioId
,TotalItem
,TotalCost
FROM (
SELECT
StoreId
,ScenarioId
--,PackageIds = dbo.GROUP_CONCAT(CAST(PackageId AS nvarchar(20))) -- CONCATENING PackageId decreasing performance here. We can joing seleceted scenarios with #JoinedPackageItemWithScenarios after selection complated.
,TotalItem = SUM(ItemCount)
,TotalCost = SUM(ItemPrice)
,RowNumber = ROW_NUMBER() OVER (PARTITION BY StoreId ORDER BY SUM(ItemPrice))
FROM #JoinedPackageItemWithScenarios JPS
GROUP BY StoreId,ScenarioId
HAVING(SUM(ItemCount) >= @requestedItems)
) SLECTED
WHERE RowNumber = 1
-- NOW WE CAN POPULATE PackageIds if needed
SELECT
SS.StoreId
,SS.ScenarioId
,TotalItem = MAX(SS.TotalItem)
,TotalCost = MAX(SS.TotalCost)
,PackageIds = dbo.GROUP_CONCAT(CAST(JPS.PackageId AS nvarchar(20)))
FROM #selectedScenarios SS
JOIN #JoinedPackageItemWithScenarios AS JPS ON JPS.StoreId = SS.StoreId AND JPS.ScenarioId = SS.ScenarioId
GROUP BY SS.StoreId,SS.ScenarioId
求和
在我的测试中,这种方式至少比递归 CTE 快 10 倍,尤其是当商店和请求的商品数量增加时。它还得到 100% 正确的结果。因为当商店和请求的项目数量增加时,递归 CTE 尝试了数百万个不需要的 JOIN。
假设您想要比较所有可能的商品排列,直到购物篮中的商品总数超过您的购物篮总数,如下所示的内容可以满足您的要求。
DECLARE @N INT = 1;
DECLARE @myTable TABLE (storeID INT DEFAULT(1), packageID INT IDENTITY(1, 1), item INT, cost INT);
INSERT @myTable (item, cost) VALUES (1, 50), (2, 60), (3, 80), (4, 100), (5, 169), (5, 165), (4, 101), (2, 61);
WITH CTE1 AS (
SELECT item, cost
FROM (
SELECT item, cost, ROW_NUMBER() OVER (PARTITION BY item ORDER BY cost) RN
FROM @myTable) T
WHERE RN = 1)
, CTE2 AS (
SELECT CAST('items'+CAST(C1.item AS VARCHAR(10)) AS VARCHAR(4000)) items, C1.cost totalCost, C1.item totalItems
FROM CTE1 C1
UNION ALL
SELECT CAST(C2.items + ' + items' + CAST(C1.item AS VARCHAR(10)) AS VARCHAR(4000)), C1.cost + C2.totalCost, C1.item + C2.totalItems
FROM CTE2 C2
CROSS JOIN CTE1 C1
WHERE C2.totalItems < @N)
SELECT TOP 1 *
FROM CTE2
WHERE totalItems >= @N
ORDER BY totalCost, totalItems DESC;
编辑处理@Matt 提到的问题。
如果需要组合,则需要递归 CTE。防止无限递归是一个挑战。这是一种方法:
with cte as (
select cast(packageid as nvarchar(4000)) as packs, item, cost
from t
union all
select concat(cte.packs, ',', t.packageid), cte.item + t.item, cte.cost + t.cost
from cte join
t
on cte.item + t.item < 10 -- some "reasonable" stop condition
)
select top 1 cte.*
from cte
where cte.item >= 5
order by cost desc;
我不是 100% 确定 SQL 服务器会接受加入条件,但这应该可行。
IF OBJECT_ID('tempdb..#TestResults') IS NOT NULL
BEGIN
DROP TABLE #TestResults
END
DECLARE @MinItemCount INT = 5
;WITH cteMaxCostToConsider AS (
SELECT
StoreId
,CASE
WHEN (SUM(ItemCount) >= @MinItemCount) AND
SUM(ItemPrice) < MIN(((@MinItemCount / ItemCount) + IIF((@MinItemCount % ItemCount) > 0, 1,0)) * ItemPrice) THEN SUM(ItemPrice)
ELSE MIN(((@MinItemCount / ItemCount) + IIF((@MinItemCount % ItemCount) > 0, 1,0)) * ItemPrice)
END AS MaxCostToConsider
FROM
storePackages
GROUP BY
StoreId
)
, cteRecursive AS (
SELECT
StoreId
,'<PackageId>' + CAST(PackageId AS VARCHAR(MAX)) + '</PackageId>' AS PackageIds
,ItemCount AS CombinedItemCount
,CAST(ItemPrice AS decimal(18,8)) AS CombinedCost
FROM
storePackages
UNION ALL
SELECT
r.StoreId
,r.PackageIds + '<PackageId>' + CAST(t.PackageId AS VARCHAR(MAX)) + '</PackageId>'
,r.CombinedItemCount + t.ItemCount
,CAST(r.CombinedCost + t.ItemPrice AS decimal(18,8))
FROM
cteRecursive r
INNER JOIN storePackages t
ON r.StoreId = t.StoreId
INNER JOIN cteMaxCostToConsider m
ON r.StoreId = m.StoreId
AND r.CombinedCost + t.ItemPrice <= m.MaxCostToConsider
)
, cteCombinedCostRowNum AS (
SELECT
StoreId
,CAST(PackageIds AS XML) AS PackageIds
,CombinedCost
,CombinedItemCount
,DENSE_RANK() OVER (PARTITION BY StoreId ORDER BY CombinedCost) AS CombinedCostRowNum
,ROW_NUMBER() OVER (PARTITION BY StoreId ORDER BY CombinedCost) AS PseudoCartId
FROM
cteRecursive
WHERE
CombinedItemCount >= @MinItemCount
)
SELECT DISTINCT
c.StoreId
,x.PackageIds
,c.CombinedItemCount
,c.CombinedCost
INTO #TestResults
FROM
cteCombinedCostRowNum c
CROSS APPLY (
SELECT( STUFF ( (
SELECT ',' + PackageId
FROM
(SELECT T.N.value('.','VARCHAR(100)') as PackageId FROM c.PackageIds.nodes('PackageId') as T(N)) p
ORDER BY
PackageId
FOR XML PATH(''), TYPE ).value('.','NVARCHAR(MAX)'), 1, 1, '')
) as PackageIds
) x
WHERE
CombinedCostRowNum = 1
SELECT *
FROM
#TestResults
大约需要 1000-2000 毫秒,具体取决于测试数据中必须考虑的组合(例如,有时您的脚本会生成或多或少的数据)。
这个答案无疑看起来比 Gordon 或 ZLK 复杂一点,但它处理了 Ties、重复值、1 个符合标准的包裹和其他一些东西。然而,主要区别实际上是在最后一个查询中,我采用在递归查询期间构建的 XML 将其拆分,然后按顺序重新组合,以便您可以使用 DISTINCT 并获得唯一的配对,例如package 2 + package 3 = 140 & package 3 + package 2 = 140 将是所有查询中的前 2 个结果,因此使用 XML 拆分然后重新组合允许它成为一行。但是假设您还有另一行,例如 (1,5,2,60),它有 2 个项目并且成本为 60,此查询也将 return 该组合。
您可以在答案之间挑选,并使用他们的方法获得组合,使用我的方法获得最终结果等....但要解释我的查询过程。
cteMaxCostToConsider
- 这只是一种获取包含递归查询的成本的方法,以便必须考虑更少的记录。它的作用是确定所有包裹一起的成本,或者如果您购买所有相同包裹以满足最小数量的成本。
cteRecursive
- 这类似于 ZLK 的答案,也有点像 Gordon 的答案,但它所做的是退出并继续添加项目和项目组合,直到达到 MaxCostToConsider。如果我限制查看商品数量,它可能会错过 7 件商品比 5 件商品便宜的情况,因此通过约束确定的组合成本,它限制了递归并表现得更好。
cteCombinedCostRowNum
- 这只是找到最低的组合成本和至少最小的项目数。
最后的查询有点棘手,但交叉应用将递归 cte 中的 XML 字符串构建拆分为不同的行,重新排序这些行,然后再次连接它们,以便反向组合,例如Package 2 & Package 3 reverse Package 3 & Package 2变成同一条记录然后调用distinct.
这比 SELECT top N 更灵活。要查看差异,请将以下测试用例一次添加到您的测试数据 1: (StoreId, PackageId, Item, Cost)
- (1,5,2,60)
- (1,6,1,1),(1,7,1,1)
- (1,8,50,1)
已编辑。 以上将为您提供具有最低综合成本的商店的每种组合。您注意到的错误是由于 cteMaxCostToConsider
。我使用的是 SUM(ItemPrice)
,但有时与之相关的 SUM(ItemCount)
中没有足够的项目,因此无法将其考虑用于 MaxCostToConsider。我修改了案例陈述以更正该问题。
我还进行了修改以使用您提供的数据示例。请注意,您应该将其中的 PackageId 更改为 IDENTITY 列,因为我使用您使用的方法在商店中获取了重复的 PackageIds。
这是你的脚本的修改版本,看看我在说什么:
IF OBJECT_ID('storePackages') IS NOT NULL
BEGIN
DROP TABLE storePackages
END
CREATE TABLE storePackages(
StoreId int not null,
PackageId int not null IDENTITY(1,1),
ItemType int not null, -- there are tree item type 0 is normal item, 1 is item has discount 2 is free item
ItemCount int not null,
ItemPrice decimal(18,8) not null,
MaxItemQouta int not null, -- in generaly a package can have between 1 and 6 qouata but in rare can up to 20-25
MaxFullQouta int not null -- sometimes a package can have additional free or discount item qouta. MaxFullQouta will always greater then MaxItemQouta
)
declare @totalStores int
set @totalStores = (SELECT TOP 1 n = number FROM master..[spt_values] WHERE number BETWEEN 200 AND 400 ORDER BY NEWID())
declare @storeId int;
declare @packageId int;
declare @maxPackageForStore int;
declare @itemMinPrice decimal(18,8);
set @storeId = 1;
set @packageId = 1
while(@storeId <= @totalStores)
BEGIN
set @maxPackageForStore = (SELECT TOP 1 n = number FROM master..[spt_values] WHERE number BETWEEN 2 AND 6 ORDER BY NEWID())
set @itemMinPrice = (SELECT TOP 1 n = number FROM master..[spt_values] WHERE number BETWEEN 40 AND 100 ORDER BY NEWID())
BEGIN
INSERT INTO storePackages (StoreId, ItemType, ItemCount, ItemPrice, MaxFullQouta, MaxItemQouta)
SELECT DISTINCT
StoreId = @storeId
--,PackageId = CAST(@packageId + number AS int)
,ItemType = 0
,ItemCount = number
,ItemPrice = @itemMinPrice + (10 * (SELECT TOP 1 n = number FROM master..[spt_values] WHERE number BETWEEN pkgNo.number AND pkgNo.number + 2 ORDER BY NEWID()))
,MaxItemQouta = @maxPackageForStore
,MaxFullQouta = @maxPackageForStore + (CASE WHEN number > 1 AND number < 4 THEN 1 ELSE 0 END)
FROM master..[spt_values] pkgNo
WHERE number BETWEEN 1 AND @maxPackageForStore
UNION ALL
SELECT DISTINCT
StoreId = @storeId
--,PackageId = CAST(@packageId + number AS int)
,ItemType = 1
,ItemCount = 1
,ItemPrice = (@itemMinPrice / 2) + (10 * (SELECT TOP 1 n = number FROM master..[spt_values] WHERE number BETWEEN pkgNo.number AND pkgNo.number + 2 ORDER BY NEWID()))
,MaxItemQouta = @maxPackageForStore
,MaxFullQouta = @maxPackageForStore + (SELECT TOP 1 n = number FROM master..[spt_values] WHERE number BETWEEN 0 AND 2 ORDER BY NEWID())
FROM master..[spt_values] pkgNo
WHERE number BETWEEN 2 AND (CASE WHEN @maxPackageForStore > 4 THEN 4 ELSE @maxPackageForStore END)
--set @packageId = @packageId + @maxPackageForStore;
END
set @storeId =@storeId + 1;
END
SELECT * FROM storePackages
--drop table #storePackages
没有 PackageIds 简单的 StoreId 和最低的组合成本 - ~200-300MS 取决于数据 接下来,如果您不关心那里有什么包,并且您只希望每个商店有 1 行,您可以执行以下操作:
IF OBJECT_ID('tempdb..#TestResults') IS NOT NULL
BEGIN
DROP TABLE #TestResults
END
DECLARE @MinItemCount INT = 5
;WITH cteMaxCostToConsider AS (
SELECT
StoreId
,CASE
WHEN (SUM(ItemCount) >= @MinItemCount) AND
SUM(ItemPrice) < MIN(((@MinItemCount / ItemCount) + IIF((@MinItemCount % ItemCount) > 0, 1,0)) * ItemPrice) THEN SUM(ItemPrice)
ELSE MIN(((@MinItemCount / ItemCount) + IIF((@MinItemCount % ItemCount) > 0, 1,0)) * ItemPrice)
END AS MaxCostToConsider
FROM
storePackages
GROUP BY
StoreId
)
, cteRecursive AS (
SELECT
StoreId
,ItemCount AS CombinedItemCount
,CAST(ItemPrice AS decimal(18,8)) AS CombinedCost
FROM
storePackages
UNION ALL
SELECT
r.StoreId
,r.CombinedItemCount + t.ItemCount
,CAST(r.CombinedCost + t.ItemPrice AS decimal(18,8))
FROM
cteRecursive r
INNER JOIN storePackages t
ON r.StoreId = t.StoreId
INNER JOIN cteMaxCostToConsider m
ON r.StoreId = m.StoreId
AND r.CombinedCost + t.ItemPrice <= m.MaxCostToConsider
)
SELECT
StoreId
,MIN(CombinedCost) as CombinedCost
INTO #TestResults
FROM
cteRecursive
WHERE
CombinedItemCount >= @MinItemCount
GROUP BY
StoreId
SELECT *
FROM
#TestResults
WITH PackageIds 每个 StoreId 只有 1 条记录 - 根据测试 data/combinations 考虑~600-1300MS 变化很大 或者,如果您仍然想要包裹 ID,但您不关心您选择的组合,并且您只想要 1 条记录,那么您可以这样做:
IF OBJECT_ID('tempdb..#TestResults') IS NOT NULL
BEGIN
DROP TABLE #TestResults
END
DECLARE @MinItemCount INT = 5
;WITH cteMaxCostToConsider AS (
SELECT
StoreId
,CASE
WHEN (SUM(ItemCount) >= @MinItemCount) AND
SUM(ItemPrice) < MIN(((@MinItemCount / ItemCount) + IIF((@MinItemCount % ItemCount) > 0, 1,0)) * ItemPrice) THEN SUM(ItemPrice)
ELSE MIN(((@MinItemCount / ItemCount) + IIF((@MinItemCount % ItemCount) > 0, 1,0)) * ItemPrice)
END AS MaxCostToConsider
FROM
storePackages
GROUP BY
StoreId
)
, cteRecursive AS (
SELECT
StoreId
,CAST(PackageId AS VARCHAR(MAX)) AS PackageIds
,ItemCount AS CombinedItemCount
,CAST(ItemPrice AS decimal(18,8)) AS CombinedCost
FROM
storePackages
UNION ALL
SELECT
r.StoreId
,r.PackageIds + ',' + CAST(t.PackageId AS VARCHAR(MAX))
,r.CombinedItemCount + t.ItemCount
,CAST(r.CombinedCost + t.ItemPrice AS decimal(18,8))
FROM
cteRecursive r
INNER JOIN storePackages t
ON r.StoreId = t.StoreId
INNER JOIN cteMaxCostToConsider m
ON r.StoreId = m.StoreId
AND r.CombinedCost + t.ItemPrice <= m.MaxCostToConsider
)
, cteCombinedCostRowNum AS (
SELECT
StoreId
,PackageIds
,CombinedCost
,CombinedItemCount
,ROW_NUMBER() OVER (PARTITION BY StoreId ORDER BY CombinedCost) AS RowNumber
FROM
cteRecursive
WHERE
CombinedItemCount >= @MinItemCount
)
SELECT DISTINCT
c.StoreId
,c.PackageIds
,c.CombinedItemCount
,c.CombinedCost
INTO #TestResults
FROM
cteCombinedCostRowNum c
WHERE
RowNumber = 1
SELECT *
FROM
#TestResults
请注意,所有基准测试都是在一台使用了 4 年的英特尔 i7-3520M CPU 2.9 GHz、8 GB RAM 和三星 500 GB EVO SSD 笔记本电脑上完成的。因此,如果您 运行 在资源适当的服务器上执行此操作,我希望速度会呈指数级增长。毫无疑问,在 storePackages 上添加索引也会加快答案。
首先我们应该找到所有组合,然后select寻找价值最小的组合
DECLARE @Table as TABLE (StoreId INT, PackageId INT, Item INT, Cost INT)
INSERT INTO @Table VALUES (1,1,1,50),(1,2,2,60),(1,3,3,80),(1,4,4,100)
DECLARE @MinItemCount INT = 5;
WITH cteCombinationTable AS (
SELECT cast(PackageId as NVARCHAR(4000)) as Package, Item, Cost
FROM @Table
UNION ALL
SELECT CONCAT(o.Package,',',c.PackageId), c.Item + o.Item, c.Cost + o.Cost FROM @Table as c join cteCombinationTable as o on CONCAT(o.Package,',',c.PackageId) <> Package
where o.Item < @MinItemCount
)
select top 1 *
from cteCombinationTable
where item >= @MinItemCount
order by cast(cost as decimal)/@MinItemCount
我的解决方案
首先,我感谢所有试图帮助我的人。然而,所有建议的解决方案都基于 CTE。正如我之前所说,当考虑成百上千的商店时,递归 CTE 会导致性能问题。一次也请求多个包。这意味着,一个请求可以包含多个篮子。一件是 5 件,一件是 3 件,一件是 7 件...
最后一个解决方案
首先,我根据项目大小在 table 中生成所有可能的场景...通过这种方式,我可以选择消除不需要的场景。
CREATE TABLE ItemScenarios(
Item int,
ScenarioId int,
CalculatedItem int --this will be joined with Store Item
)
然后我生成了从 2 项到 25 项的所有可能场景并插入到 ItemScenarios
table。场景可以通过使用 WHILE 或递归 CTE 生成一次。这种方式的好处是场景只生成一次。
结果如下。
Item | ScenarioId | CalculatedItem
--------------------------------------------------------
2 1 2
2 2 3
2 3 1
2 3 1
3 4 5
3 5 4
3 6 3
3 7 2
3 7 2
3 8 2
3 8 1
3 9 1
3 9 1
3 9 1
....
.....
......
25 993 10
通过这种方式,我可以限制场景大小、最大不同商店、最大不同包裹等。
此外,我还可以排除一些在数学上不可能的场景,这些场景比其他场景便宜。例如对于 4 项请求,某些场景
场景一:2+2
场景二:2+1+1
场景三:1+1+1+1
在这些场景中;情景 2 不可能是最便宜的篮子。因为,
如果场景2 < 场景3 --> 场景1会更低场景 2.因为降低成本的东西是2件商品的价格,而**场景1*有双2件商品
另外如果场景2 < 场景1 --> 场景3会更低然后 情景 2
现在,如果我删除像场景 2 这样的场景,我将获得一些性能优势。
现在我可以在商店中选择最便宜的商品价格
DECLARE @requestedItems int;
SET @requestedItems = 5;
CREATE TABLE #JoinedPackageItemWithScenarios(
StoreId int not null,
PackageId int not null,
ItemCount int not null,
ItemPrice decimal(18,8)
ScenarioId int not null,
)
INSERT INTO #JoinedPackageItemWithScenarios
SELECT
SPM.StoreId
,SPM.PackageId
,SPM.ItemCount
,SPM.ItemPrice
,SPM.ScenarioId
FROM (
SELECT
SP.StoreId
,SP.PackageId
,SP.ItemCount
,SP.ItemPrice
,SC.ScenarioId
,RowNumber = ROW_NUMBER() OVER (PARTITION BY SP.StoreId,SC.ScenarioId,SP.ItemCount ORDER BY SP.ItemPrice)
FROM ItemScenarios SC
LEFT JOIN StorePackages AS SP ON SP.ItemCount = SC.CalculatedItem
WHERE SC.Item = @requestedItems
) SPM
WHERE SPM.RowNumber = 1
-- NOW I HAVE CHEAPEST PRICE FOR EACH ITEM, I CAN CREATE BASKET
CREATE TABLE #selectedScenarios(
StoreId int not null,
ScenarioId int not null,
TotalItem int not null,
TotalCost decimal(18,8)
)
INSERT INTO #selectedScenarios
SELECT
StoreId
,ScenarioId
,TotalItem
,TotalCost
FROM (
SELECT
StoreId
,ScenarioId
--,PackageIds = dbo.GROUP_CONCAT(CAST(PackageId AS nvarchar(20))) -- CONCATENING PackageId decreasing performance here. We can joing seleceted scenarios with #JoinedPackageItemWithScenarios after selection complated.
,TotalItem = SUM(ItemCount)
,TotalCost = SUM(ItemPrice)
,RowNumber = ROW_NUMBER() OVER (PARTITION BY StoreId ORDER BY SUM(ItemPrice))
FROM #JoinedPackageItemWithScenarios JPS
GROUP BY StoreId,ScenarioId
HAVING(SUM(ItemCount) >= @requestedItems)
) SLECTED
WHERE RowNumber = 1
-- NOW WE CAN POPULATE PackageIds if needed
SELECT
SS.StoreId
,SS.ScenarioId
,TotalItem = MAX(SS.TotalItem)
,TotalCost = MAX(SS.TotalCost)
,PackageIds = dbo.GROUP_CONCAT(CAST(JPS.PackageId AS nvarchar(20)))
FROM #selectedScenarios SS
JOIN #JoinedPackageItemWithScenarios AS JPS ON JPS.StoreId = SS.StoreId AND JPS.ScenarioId = SS.ScenarioId
GROUP BY SS.StoreId,SS.ScenarioId
求和
在我的测试中,这种方式至少比递归 CTE 快 10 倍,尤其是当商店和请求的商品数量增加时。它还得到 100% 正确的结果。因为当商店和请求的项目数量增加时,递归 CTE 尝试了数百万个不需要的 JOIN。