SQLite 中的移动平均线
Moving average in SQLite
我想计算 SQLite table 中数据的移动平均值。我在 MySQL 中找到了几种方法,但在 SQLite.
中找不到有效的方法
在SQL中,我认为应该这样做(但是,我无法尝试...):
SELECT date, value,
avg(value) OVER (ORDER BY date ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING) as MovingAverageWindow7
FROM t ORDER BY date;
但是,我看到两个缺点:
- 这似乎不适用于 sqlite
- 如果 preceding/following 行的几个日期的数据不连续,它会计算 window 的移动平均值,这比我实际想要的要宽,因为它仅基于周围的行。因此,应添加日期条件
的确,我希望它计算每个日期 'value' 的平均值,超过 +/-3 天(每周移动平均值)或 +/-15 天(每月移动平均值)
这是一个示例数据集:
CREATE TABLE t ( date DATE, value INTEGER );
INSERT INTO t (date, value) VALUES ('2018-02-01', 8);
INSERT INTO t (date, value) VALUES ('2018-02-02', 2);
INSERT INTO t (date, value) VALUES ('2018-02-05', 5);
INSERT INTO t (date, value) VALUES ('2018-02-06', 4);
INSERT INTO t (date, value) VALUES ('2018-02-07', 1);
INSERT INTO t (date, value) VALUES ('2018-02-10', 6);
INSERT INTO t (date, value) VALUES ('2018-02-11', 0);
INSERT INTO t (date, value) VALUES ('2018-02-12', 2);
INSERT INTO t (date, value) VALUES ('2018-02-13', 1);
INSERT INTO t (date, value) VALUES ('2018-02-14', 3);
INSERT INTO t (date, value) VALUES ('2018-02-15', 11);
INSERT INTO t (date, value) VALUES ('2018-02-18', 4);
INSERT INTO t (date, value) VALUES ('2018-02-20', 1);
INSERT INTO t (date, value) VALUES ('2018-02-21', 5);
INSERT INTO t (date, value) VALUES ('2018-02-28', 10);
INSERT INTO t (date, value) VALUES ('2018-03-02', 6);
INSERT INTO t (date, value) VALUES ('2018-03-03', 7);
INSERT INTO t (date, value) VALUES ('2018-03-04', 3);
INSERT INTO t (date, value) VALUES ('2018-03-08', 5);
INSERT INTO t (date, value) VALUES ('2018-03-09', 6);
INSERT INTO t (date, value) VALUES ('2018-03-15', 1);
INSERT INTO t (date, value) VALUES ('2018-03-16', 3);
INSERT INTO t (date, value) VALUES ('2018-03-25', 5);
INSERT INTO t (date, value) VALUES ('2018-03-31', 1);
我想我实际上找到了解决方案:
SELECT date, value,
(SELECT AVG(value) FROM t t2
WHERE datetime(t1.date, '-3 days') <= datetime(t2.date) AND datetime(t1.date, '+3 days') >= datetime(t2.date)
) AS MAVG
FROM t t1
GROUP BY strftime('%Y-%m-%d', date);
我不知道这是否是最有效的方法,但它似乎有效
编辑:
应用于包含 20,000 行的真实数据库,计算两个参数的每周移动平均值大约需要 1 分钟。
我看到有两个选项:
- 有一种更有效的方法可以使用 SQLite 进行计算
- 从 SQLite
中提取数据后,我在 Python 中计算移动平均值
一种方法是创建一个中间 table 将每个日期映射到它所属的组。
CREATE TABLE groups (date DATE, daygroup DATE);
INSERT INTO groups
SELECT date, strftime('%Y-%m-%d', datetime(date, '-1 days')) AS daygroup
FROM t;
INSERT INTO groups
SELECT date, strftime('%Y-%m-%d', datetime(date, '-2 days')) AS daygroup
FROM t;
INSERT INTO groups
SELECT date, strftime('%Y-%m-%d', datetime(date, '-3 days')) AS daygroup
FROM t;
INSERT INTO groups
SELECT date, strftime('%Y-%m-%d', datetime(date, '+1 days')) AS daygroup
FROM t;
INSERT INTO groups
SELECT date, strftime('%Y-%m-%d', datetime(date, '+2 days')) AS daygroup
FROM t;
INSERT INTO groups
SELECT date, strftime('%Y-%m-%d', datetime(date, '+3 days')) AS daygroup
FROM t;
INSERT INTO groups
SELECT date, date AS daygroup FROM t;
你得到例如,
SELECT * FROM groups WHERE date = '2018-02-05'
date daygroup
2018-02-05 2018-02-04
2018-02-05 2018-02-03
2018-02-05 2018-02-02
2018-02-05 2018-02-06
2018-02-05 2018-02-07
2018-02-05 2018-02-08
2018-02-05 2018-02-05
表示'2018-02-05'属于组'2018-02-02'到'2018-02-08'。如果一个日期属于一个组,则数据的值加入该组的移动平均计算。
有了这个,计算移动平均线就变得简单了:
SELECT
d.date, d.value, c.ma
FROM
t AS d
INNER JOIN
(SELECT
b.daygroup,
avg(a.value) AS ma
FROM
t AS a
INNER JOIN
groups AS b
ON a.date = b.date
GROUP BY b.daygroup) AS c
ON
d.date = c.daygroup
请注意,中间table的行数是原始table行数的7倍,它随着window宽度的增加而按比例增长。这应该是 acceptable 除非你有更大的 table.
我还试验了 20 000 行。
插入查询用了 1.5 秒,select 查询在我的笔记本电脑上用了 0.5 秒。
已添加,也许更好。
不需要中间的替代方案table。
下面的查询将 table 与其自身合并,允许 3 天的延迟,然后取平均值。
SELECT
t1.date, avg(t2.value) AS MVG
FROM
t AS t1
INNER JOIN
t AS t2
ON
datetime(t1.date, '-3 days') <= datetime(t2.date)
AND
datetime(t1.date, '+3 days') >= datetime(t2.date)
GROUP BY
t1.date
;
Window 功能已在 3.25.0 (2018-09-15) 版本中添加。随着版本 3.28.0 (2019-04-16) 中添加的 RANGE 帧类型,您现在可以:
SELECT date, value,
avg(value) OVER (
ORDER BY CAST (strftime('%s', date) AS INT)
RANGE BETWEEN 3 * 24 * 60 * 60 PRECEDING
AND 3 * 24 * 60 * 60 FOLLOWING
) AS MovingAverageWindow7
FROM t ORDER BY date;
我想计算 SQLite table 中数据的移动平均值。我在 MySQL 中找到了几种方法,但在 SQLite.
中找不到有效的方法在SQL中,我认为应该这样做(但是,我无法尝试...):
SELECT date, value,
avg(value) OVER (ORDER BY date ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING) as MovingAverageWindow7
FROM t ORDER BY date;
但是,我看到两个缺点:
- 这似乎不适用于 sqlite
- 如果 preceding/following 行的几个日期的数据不连续,它会计算 window 的移动平均值,这比我实际想要的要宽,因为它仅基于周围的行。因此,应添加日期条件
的确,我希望它计算每个日期 'value' 的平均值,超过 +/-3 天(每周移动平均值)或 +/-15 天(每月移动平均值)
这是一个示例数据集:
CREATE TABLE t ( date DATE, value INTEGER );
INSERT INTO t (date, value) VALUES ('2018-02-01', 8);
INSERT INTO t (date, value) VALUES ('2018-02-02', 2);
INSERT INTO t (date, value) VALUES ('2018-02-05', 5);
INSERT INTO t (date, value) VALUES ('2018-02-06', 4);
INSERT INTO t (date, value) VALUES ('2018-02-07', 1);
INSERT INTO t (date, value) VALUES ('2018-02-10', 6);
INSERT INTO t (date, value) VALUES ('2018-02-11', 0);
INSERT INTO t (date, value) VALUES ('2018-02-12', 2);
INSERT INTO t (date, value) VALUES ('2018-02-13', 1);
INSERT INTO t (date, value) VALUES ('2018-02-14', 3);
INSERT INTO t (date, value) VALUES ('2018-02-15', 11);
INSERT INTO t (date, value) VALUES ('2018-02-18', 4);
INSERT INTO t (date, value) VALUES ('2018-02-20', 1);
INSERT INTO t (date, value) VALUES ('2018-02-21', 5);
INSERT INTO t (date, value) VALUES ('2018-02-28', 10);
INSERT INTO t (date, value) VALUES ('2018-03-02', 6);
INSERT INTO t (date, value) VALUES ('2018-03-03', 7);
INSERT INTO t (date, value) VALUES ('2018-03-04', 3);
INSERT INTO t (date, value) VALUES ('2018-03-08', 5);
INSERT INTO t (date, value) VALUES ('2018-03-09', 6);
INSERT INTO t (date, value) VALUES ('2018-03-15', 1);
INSERT INTO t (date, value) VALUES ('2018-03-16', 3);
INSERT INTO t (date, value) VALUES ('2018-03-25', 5);
INSERT INTO t (date, value) VALUES ('2018-03-31', 1);
我想我实际上找到了解决方案:
SELECT date, value,
(SELECT AVG(value) FROM t t2
WHERE datetime(t1.date, '-3 days') <= datetime(t2.date) AND datetime(t1.date, '+3 days') >= datetime(t2.date)
) AS MAVG
FROM t t1
GROUP BY strftime('%Y-%m-%d', date);
我不知道这是否是最有效的方法,但它似乎有效
编辑: 应用于包含 20,000 行的真实数据库,计算两个参数的每周移动平均值大约需要 1 分钟。
我看到有两个选项:
- 有一种更有效的方法可以使用 SQLite 进行计算
- 从 SQLite 中提取数据后,我在 Python 中计算移动平均值
一种方法是创建一个中间 table 将每个日期映射到它所属的组。
CREATE TABLE groups (date DATE, daygroup DATE);
INSERT INTO groups
SELECT date, strftime('%Y-%m-%d', datetime(date, '-1 days')) AS daygroup
FROM t;
INSERT INTO groups
SELECT date, strftime('%Y-%m-%d', datetime(date, '-2 days')) AS daygroup
FROM t;
INSERT INTO groups
SELECT date, strftime('%Y-%m-%d', datetime(date, '-3 days')) AS daygroup
FROM t;
INSERT INTO groups
SELECT date, strftime('%Y-%m-%d', datetime(date, '+1 days')) AS daygroup
FROM t;
INSERT INTO groups
SELECT date, strftime('%Y-%m-%d', datetime(date, '+2 days')) AS daygroup
FROM t;
INSERT INTO groups
SELECT date, strftime('%Y-%m-%d', datetime(date, '+3 days')) AS daygroup
FROM t;
INSERT INTO groups
SELECT date, date AS daygroup FROM t;
你得到例如,
SELECT * FROM groups WHERE date = '2018-02-05'
date daygroup
2018-02-05 2018-02-04
2018-02-05 2018-02-03
2018-02-05 2018-02-02
2018-02-05 2018-02-06
2018-02-05 2018-02-07
2018-02-05 2018-02-08
2018-02-05 2018-02-05
表示'2018-02-05'属于组'2018-02-02'到'2018-02-08'。如果一个日期属于一个组,则数据的值加入该组的移动平均计算。
有了这个,计算移动平均线就变得简单了:
SELECT
d.date, d.value, c.ma
FROM
t AS d
INNER JOIN
(SELECT
b.daygroup,
avg(a.value) AS ma
FROM
t AS a
INNER JOIN
groups AS b
ON a.date = b.date
GROUP BY b.daygroup) AS c
ON
d.date = c.daygroup
请注意,中间table的行数是原始table行数的7倍,它随着window宽度的增加而按比例增长。这应该是 acceptable 除非你有更大的 table.
我还试验了 20 000 行。 插入查询用了 1.5 秒,select 查询在我的笔记本电脑上用了 0.5 秒。
已添加,也许更好。
不需要中间的替代方案table。 下面的查询将 table 与其自身合并,允许 3 天的延迟,然后取平均值。
SELECT
t1.date, avg(t2.value) AS MVG
FROM
t AS t1
INNER JOIN
t AS t2
ON
datetime(t1.date, '-3 days') <= datetime(t2.date)
AND
datetime(t1.date, '+3 days') >= datetime(t2.date)
GROUP BY
t1.date
;
Window 功能已在 3.25.0 (2018-09-15) 版本中添加。随着版本 3.28.0 (2019-04-16) 中添加的 RANGE 帧类型,您现在可以:
SELECT date, value,
avg(value) OVER (
ORDER BY CAST (strftime('%s', date) AS INT)
RANGE BETWEEN 3 * 24 * 60 * 60 PRECEDING
AND 3 * 24 * 60 * 60 FOLLOWING
) AS MovingAverageWindow7
FROM t ORDER BY date;