提高这个string_agg的速度?

Improve the speed of this string_agg?

我有以下形状的数据:

BOM -- 500 rows, 4 cols
PartProject -- 2.6mm rows, 4 cols
Project -- 1000 rows, 5 cols
Part -- 200k rows, 18 cols

然而,当我尝试执行 string_agg 时,我的代码将花费我超过 10 分钟的时间来执行 500 行。我该如何改进这个查询(数据不可用)。

select
    BOM.*,
    childParentPartProjectName
into #tt2 -- tt for some testing
from #tt1 AS BOM -- tt for some testing
-- cross applys for string agg many to one
CROSS APPLY (
    SELECT childParentPartProjectName = STRING_AGG(PROJECT_childParentPart.NAME, ', ') WITHIN GROUP (ORDER BY PROJECT_childParentPart.NAME)
    FROM (
        SELECT DISTINCT PROJECT3.NAME
    FROM [dbo].[Project] PROJECT3
    LEFT JOIN [dbo].[Part] P3 on P3.ITEM_NUMBER = BOM.childParentPart
    LEFT JOIN [dbo].[PartProject] PP3 on PP3.SOURCE_ID = P3.ID
    WHERE PP3.RELATED_ID = PROJECT3.ID and P3.CURRENT = 1
) PROJECT_childParentPart ) PROJECT3

您的子查询(在子查询内)有一个代码“味道”,它是有意编写的,但不正确。

首先,您在子查询中有 2 个 LEFT JOIN,但是,别名为 P3PP3 的表都需要具有非 NULL 值;如果没有找到相关行,那是不可能。这意味着 JOINs 是隐含的 INNER JOINs.

接下来,当从 多个 表中 SELECTing 时,您有一个针对单个列的 DISTINCT;这似乎是错误的。 DISTINCT 非常 昂贵并且您正在使用它的事实意味着 NAME 不是唯一的或者由于您隐含的 INNER JOINs 您是获取重复的行。我猜是后者。因此,很可能您实际上应该使用 EXISTS,而不是 LEFT JOINs INNER JOINs.

以下是猜测,但我怀疑它会更高效。

SELECT BOM.*, --Replace this with an explicit list of the columns you need
       SA.childParentPartProjectName
INTO #tt2
FROM #tt1 BOM
     CROSS APPLY (SELECT STRING_AGG(Prj.NAME, ', ') WITHIN GROUP (ORDER BY Prj.NAME) AS childParentPartProjectName
                  FROM dbo.Project Prj --Don't use an alias that is longer than the object name
                  WHERE EXISTS (SELECT 1
                                FROM dbo.Part P
                                     JOIN dbo.PartProject PP ON P.ID = PP.SOURCE_ID
                                WHERE PP.Related_ID = Prg.ID
                                  AND P.ITEM_NUMBER = BOM.childParentPart
                                  AND P.Current = 1)) SA;