在 SQL 中找到最适合的轧制线

Find rolling line of best fit in SQL

我试图找到最适合一组数据的滚动线,当我们一次查看五个点的组时,按 x 值排序。换句话说:

这是我的目标值,在 Excel sheet 和绘图中。根据纸笔和在线线性回归计算,slopeyIntercept 的值是正确的:

...这是我目前的SQL:

WITH dataset AS (
    SELECT 1 AS x, 9 AS y UNION ALL
    SELECT 2 AS x, 7 AS y UNION ALL
    SELECT 3 AS x, 5 AS y UNION ALL
    SELECT 4 AS x, 3 AS y UNION ALL
    SELECT 5 AS x, 1 AS y UNION ALL
    SELECT 6 AS x, 1 AS y UNION ALL
    SELECT 7 AS x, 1 AS y UNION ALL
    SELECT 8 AS x, 1 AS y UNION ALL
    SELECT 9 AS x, 1 AS y
),
rollingAverages AS (
    SELECT
        dataset.*,
        AVG(dataset.x * 1.00) OVER (ORDER BY x ROWS BETWEEN 4 PRECEDING AND CURRENT ROW) AS [xMean],
        AVG(dataset.y * 1.00) OVER (ORDER BY x ROWS BETWEEN 4 PRECEDING AND CURRENT ROW) AS [yMean],
        SUM(1) OVER (ORDER BY x ROWS BETWEEN 4 PRECEDING AND CURRENT ROW) AS [yCount]
    FROM dataset
),
mValue AS (
    SELECT
        *,
        CASE WHEN yCount < 5 THEN NULL ELSE x - yCount + 1 END AS xStart,
        CASE WHEN yCount < 5 THEN NULL ELSE x END AS xEnd,
        CASE
            WHEN yCount < 5 THEN NULL
            WHEN SUM((x - xMean) * (x - xMean)) OVER (ORDER BY x ROWS BETWEEN 4 PRECEDING AND CURRENT ROW) = 0
            THEN 0
            ELSE
            SUM((x - xMean) * (y - yMean)) OVER (ORDER BY x ROWS BETWEEN 4 PRECEDING AND CURRENT ROW)
            / SUM((x - xMean) * (x - xMean)) OVER (ORDER BY x ROWS BETWEEN 4 PRECEDING AND CURRENT ROW)
        END AS slope
    FROM rollingAverages
),
-- This is the y intercept at the start of the range, i.e. 40 trading days before "today"
yIntercept AS (
    SELECT
        *,
        yMean - slope * xMean AS yIntercept
    FROM mValue
),
channelNowMidpoint AS (
    SELECT
        *
    FROM yIntercept
)

SELECT *
FROM channelNowMidpoint
ORDER BY x

我没有得到 slopeyIntercept 的正确值,我想是因为我正在使用的最佳拟合线算法需要一组无限的值,所以当我到达名为 mValue 的 CTE 时,我得到的 xMeanyMean 的计算已经丢失了上下文。作为参考,您可以找到使用“最小二乘法”方法的最佳拟合线算法here

我在 SSMS 中 运行 这个 SQL 时得到的值见下文:

如您所见,其中x = 5slopeyIntercept是正确的,但之后就不正确了。我不确定哪里出错了以及如何获得我的目标值。

好的,我知道了。在这种情况下使用 window 函数的问题是我们在正在处理的行中获取 xMeanyMean 的值,而不是在 window 的前导行.

为了解决此问题,mValue CTE 需要重新加入 dataset CTE 以获取 xy 的值,然后停止使用 window 函数,因此每行的 xMeanyMean 值是静态的:

mValue AS (
    SELECT
        ra.*,
        CASE WHEN ra.yCount < 5 THEN NULL ELSE ra.x - ra.yCount + 1 END AS xStart,
        CASE WHEN ra.yCount < 5 THEN NULL ELSE ra.x END AS xEnd,
        CASE
            WHEN ra.yCount < 5 THEN NULL
            WHEN SUM((ds.x - xMean) * (ds.x - xMean)) = 0
            THEN 0
            ELSE
            SUM((ds.x - xMean) * (ds.y - yMean)) / SUM((ds.x - xMean) * (ds.x - xMean))
        END AS slope
    FROM rollingAverages AS ra
    INNER JOIN dataset AS ds
        ON ra.x - ds.x BETWEEN 0 AND 4
    GROUP BY
        ra.x,
        ra.y,
        ra.xMean,
        ra.yMean,
        ra.yCount
),

结果: