在 SQL 中找到最适合的轧制线
Find rolling line of best fit in SQL
我试图找到最适合一组数据的滚动线,当我们一次查看五个点的组时,按 x 值排序。换句话说:
- 第 1-4 行没有值,因为我们还没有 5 个总值
- 对于第 5 行,获取第 1-5 行的斜率和 yIntercept
- 对于第 6 行,获取第 2-6 行的斜率和 yIntercept
- 对于第 7 行,获取第 3-7 行的斜率和 yIntercept
- 对于第 8 行,获取第 4-8 行的斜率和 yIntercept
- 对于第 9 行,获取第 5-9 行的斜率和 yIntercept
这是我的目标值,在 Excel sheet 和绘图中。根据纸笔和在线线性回归计算,slope
和 yIntercept
的值是正确的:
...这是我目前的SQL:
WITH dataset AS (
SELECT 1 AS x, 9 AS y UNION ALL
SELECT 2 AS x, 7 AS y UNION ALL
SELECT 3 AS x, 5 AS y UNION ALL
SELECT 4 AS x, 3 AS y UNION ALL
SELECT 5 AS x, 1 AS y UNION ALL
SELECT 6 AS x, 1 AS y UNION ALL
SELECT 7 AS x, 1 AS y UNION ALL
SELECT 8 AS x, 1 AS y UNION ALL
SELECT 9 AS x, 1 AS y
),
rollingAverages AS (
SELECT
dataset.*,
AVG(dataset.x * 1.00) OVER (ORDER BY x ROWS BETWEEN 4 PRECEDING AND CURRENT ROW) AS [xMean],
AVG(dataset.y * 1.00) OVER (ORDER BY x ROWS BETWEEN 4 PRECEDING AND CURRENT ROW) AS [yMean],
SUM(1) OVER (ORDER BY x ROWS BETWEEN 4 PRECEDING AND CURRENT ROW) AS [yCount]
FROM dataset
),
mValue AS (
SELECT
*,
CASE WHEN yCount < 5 THEN NULL ELSE x - yCount + 1 END AS xStart,
CASE WHEN yCount < 5 THEN NULL ELSE x END AS xEnd,
CASE
WHEN yCount < 5 THEN NULL
WHEN SUM((x - xMean) * (x - xMean)) OVER (ORDER BY x ROWS BETWEEN 4 PRECEDING AND CURRENT ROW) = 0
THEN 0
ELSE
SUM((x - xMean) * (y - yMean)) OVER (ORDER BY x ROWS BETWEEN 4 PRECEDING AND CURRENT ROW)
/ SUM((x - xMean) * (x - xMean)) OVER (ORDER BY x ROWS BETWEEN 4 PRECEDING AND CURRENT ROW)
END AS slope
FROM rollingAverages
),
-- This is the y intercept at the start of the range, i.e. 40 trading days before "today"
yIntercept AS (
SELECT
*,
yMean - slope * xMean AS yIntercept
FROM mValue
),
channelNowMidpoint AS (
SELECT
*
FROM yIntercept
)
SELECT *
FROM channelNowMidpoint
ORDER BY x
我没有得到 slope
或 yIntercept
的正确值,我想是因为我正在使用的最佳拟合线算法需要一组无限的值,所以当我到达名为 mValue
的 CTE 时,我得到的 xMean
和 yMean
的计算已经丢失了上下文。作为参考,您可以找到使用“最小二乘法”方法的最佳拟合线算法here。
我在 SSMS 中 运行 这个 SQL 时得到的值见下文:
如您所见,其中x = 5
、slope
或yIntercept
是正确的,但之后就不正确了。我不确定哪里出错了以及如何获得我的目标值。
好的,我知道了。在这种情况下使用 window 函数的问题是我们在正在处理的行中获取 xMean
和 yMean
的值,而不是在 window 的前导行.
为了解决此问题,mValue
CTE 需要重新加入 dataset
CTE 以获取 x
和 y
的值,然后停止使用 window 函数,因此每行的 xMean
和 yMean
值是静态的:
mValue AS (
SELECT
ra.*,
CASE WHEN ra.yCount < 5 THEN NULL ELSE ra.x - ra.yCount + 1 END AS xStart,
CASE WHEN ra.yCount < 5 THEN NULL ELSE ra.x END AS xEnd,
CASE
WHEN ra.yCount < 5 THEN NULL
WHEN SUM((ds.x - xMean) * (ds.x - xMean)) = 0
THEN 0
ELSE
SUM((ds.x - xMean) * (ds.y - yMean)) / SUM((ds.x - xMean) * (ds.x - xMean))
END AS slope
FROM rollingAverages AS ra
INNER JOIN dataset AS ds
ON ra.x - ds.x BETWEEN 0 AND 4
GROUP BY
ra.x,
ra.y,
ra.xMean,
ra.yMean,
ra.yCount
),
结果:
我试图找到最适合一组数据的滚动线,当我们一次查看五个点的组时,按 x 值排序。换句话说:
- 第 1-4 行没有值,因为我们还没有 5 个总值
- 对于第 5 行,获取第 1-5 行的斜率和 yIntercept
- 对于第 6 行,获取第 2-6 行的斜率和 yIntercept
- 对于第 7 行,获取第 3-7 行的斜率和 yIntercept
- 对于第 8 行,获取第 4-8 行的斜率和 yIntercept
- 对于第 9 行,获取第 5-9 行的斜率和 yIntercept
这是我的目标值,在 Excel sheet 和绘图中。根据纸笔和在线线性回归计算,slope
和 yIntercept
的值是正确的:
...这是我目前的SQL:
WITH dataset AS (
SELECT 1 AS x, 9 AS y UNION ALL
SELECT 2 AS x, 7 AS y UNION ALL
SELECT 3 AS x, 5 AS y UNION ALL
SELECT 4 AS x, 3 AS y UNION ALL
SELECT 5 AS x, 1 AS y UNION ALL
SELECT 6 AS x, 1 AS y UNION ALL
SELECT 7 AS x, 1 AS y UNION ALL
SELECT 8 AS x, 1 AS y UNION ALL
SELECT 9 AS x, 1 AS y
),
rollingAverages AS (
SELECT
dataset.*,
AVG(dataset.x * 1.00) OVER (ORDER BY x ROWS BETWEEN 4 PRECEDING AND CURRENT ROW) AS [xMean],
AVG(dataset.y * 1.00) OVER (ORDER BY x ROWS BETWEEN 4 PRECEDING AND CURRENT ROW) AS [yMean],
SUM(1) OVER (ORDER BY x ROWS BETWEEN 4 PRECEDING AND CURRENT ROW) AS [yCount]
FROM dataset
),
mValue AS (
SELECT
*,
CASE WHEN yCount < 5 THEN NULL ELSE x - yCount + 1 END AS xStart,
CASE WHEN yCount < 5 THEN NULL ELSE x END AS xEnd,
CASE
WHEN yCount < 5 THEN NULL
WHEN SUM((x - xMean) * (x - xMean)) OVER (ORDER BY x ROWS BETWEEN 4 PRECEDING AND CURRENT ROW) = 0
THEN 0
ELSE
SUM((x - xMean) * (y - yMean)) OVER (ORDER BY x ROWS BETWEEN 4 PRECEDING AND CURRENT ROW)
/ SUM((x - xMean) * (x - xMean)) OVER (ORDER BY x ROWS BETWEEN 4 PRECEDING AND CURRENT ROW)
END AS slope
FROM rollingAverages
),
-- This is the y intercept at the start of the range, i.e. 40 trading days before "today"
yIntercept AS (
SELECT
*,
yMean - slope * xMean AS yIntercept
FROM mValue
),
channelNowMidpoint AS (
SELECT
*
FROM yIntercept
)
SELECT *
FROM channelNowMidpoint
ORDER BY x
我没有得到 slope
或 yIntercept
的正确值,我想是因为我正在使用的最佳拟合线算法需要一组无限的值,所以当我到达名为 mValue
的 CTE 时,我得到的 xMean
和 yMean
的计算已经丢失了上下文。作为参考,您可以找到使用“最小二乘法”方法的最佳拟合线算法here。
我在 SSMS 中 运行 这个 SQL 时得到的值见下文:
如您所见,其中x = 5
、slope
或yIntercept
是正确的,但之后就不正确了。我不确定哪里出错了以及如何获得我的目标值。
好的,我知道了。在这种情况下使用 window 函数的问题是我们在正在处理的行中获取 xMean
和 yMean
的值,而不是在 window 的前导行.
为了解决此问题,mValue
CTE 需要重新加入 dataset
CTE 以获取 x
和 y
的值,然后停止使用 window 函数,因此每行的 xMean
和 yMean
值是静态的:
mValue AS (
SELECT
ra.*,
CASE WHEN ra.yCount < 5 THEN NULL ELSE ra.x - ra.yCount + 1 END AS xStart,
CASE WHEN ra.yCount < 5 THEN NULL ELSE ra.x END AS xEnd,
CASE
WHEN ra.yCount < 5 THEN NULL
WHEN SUM((ds.x - xMean) * (ds.x - xMean)) = 0
THEN 0
ELSE
SUM((ds.x - xMean) * (ds.y - yMean)) / SUM((ds.x - xMean) * (ds.x - xMean))
END AS slope
FROM rollingAverages AS ra
INNER JOIN dataset AS ds
ON ra.x - ds.x BETWEEN 0 AND 4
GROUP BY
ra.x,
ra.y,
ra.xMean,
ra.yMean,
ra.yCount
),
结果: