在 SQLite 上创建累积移动平均线
Creating cumulative moving average on SQLite
我正在尝试在 SQLite 中创建累积移动平均线。
回顾一下,在累积移动平均线 (CMA) 中,数据以有序数据流的形式到达,我想获取当前数据点之前所有数据的平均值。
我的 Table 看起来像:
Continent,Date,Measure,Value
Antarctica,03/01/2019 12:00:00 AM,Passengers,346158
South America,03/01/2019 12:00:00 AM,Ships,6483
South America,03/01/2019 12:00:00 AM,Flights,19
Antarctica,02/01/2019 12:00:00 AM,Passengers,172163
South America,02/01/2019 12:00:00 AM,Cargo Ships,1319
Antarctica,01/01/2019 12:00:00 AM,Passengers,56810
以前的解决方案,例如 or 2 描述了每月或每周的移动平均线。然而,虽然我可以每月保持这个平均值,但我正在尝试建立一个累积平均值。
我试过这样做:
SELECT T1.Date, AVG(T2.VALUE) from my_table AS T1 INNER JOIN my_table AS T2 ON datetime(T1.Date, '-1 Month') <= datetime(T2.Date)
AND datetime(T1.Date, '+1 Month') >= datetime(T2.Date) GROUP BY
T1.date;
但是当我使用sqlite
时,日期时间操作产生错误:sqlite does not have operation datetime.
我什至尝试了简单的命令:SELECT AVG(VALUE) FROM my_table GROUP BY MEASURE, DATE, CONTINENT
,但是按移动平均线分组,这并没有解决我的问题。
我想做什么:
Continent,Date,Measure,Value,Average
Antarctica,03/01/2019 12:00:00 AM,Passengers,346158,114487
South America,03/01/2019 12:00:00 AM,Ships,6483,0
South America,03/01/2019 12:00:00 AM,Flights,19,0
Antarctica,02/01/2019 12:00:00 AM,Passengers,172163,56810
South America,02/01/2019 12:00:00 AM,Cargo Ships,1319,0
Antarctica,01/01/2019 12:00:00 AM,Passengers,56810,0
Average
列是 运行 到欧洲大陆的总过境次数的月平均值以及之前所有月份的过境方式。因此,要计算第一行的平均值(即 运行 3 月之前所有月份穿越南极洲的总乘客的月平均值),您需要计算 [=] 中穿越南极洲的乘客总数的平均总和。 =17=] 和 January 56,810
,并将其四舍五入为最接近的整数 round(228,973/2) = 114,487
。
有没有更简单的方法来解决这个问题?
首先,修复您的时间戳,使它们采用可以排序的格式,例如 sqlite date and time functions 支持的 ISO-8601 格式。不要使用 03/01/2019 12:00:00 AM
,而是使用 2019-03-01 00:00:00
(或者如果您不关心时间,只关心日期,则只使用 2019-03-01
)。这使您的 CSV 数据看起来像:
Continent,Date,Measure,Value
Antarctica,2019-03-01 00:00:00,Passengers,346158
South America,2019-03-01 00:00:00,Ships,6483
South America,2019-03-01 00:00:00,Flights,19
Antarctica,2019-02-01 00:00:00,Passengers,172163
South America,2019-02-01 00:00:00,Cargo Ships,1319
Antarctica,2019-01-01 00:00:00,Passengers,56810
然后你可以使用一个window函数(在Sqlite 3.25中引入)轻松计算前几个月的累计平均值:
SELECT continent, date, measure, value,
cast(round(ifnull(avg(value)
OVER (PARTITION BY continent, measure
ORDER BY date
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING),
0),
0) AS INTEGER) AS Average
FROM crossings
ORDER BY date DESC, continent, measure DESC;
这给出了
Continent Date Measure Value Average
---------- ------------------- ---------- ---------- ----------
Antarctica 2019-03-01 00:00:00 Passengers 346158 114487
South Amer 2019-03-01 00:00:00 Ships 6483 0
South Amer 2019-03-01 00:00:00 Flights 19 0
Antarctica 2019-02-01 00:00:00 Passengers 172163 56810
South Amer 2019-02-01 00:00:00 Cargo Ship 1319 0
Antarctica 2019-01-01 00:00:00 Passengers 56810 0
如果卡在没有window函数支持的旧版本上,您可以使用相关子查询来计算累积平均值:
SELECT continent, date, measure, value,
ifnull((SELECT cast(round(avg(c2.value), 0) AS INTEGER)
FROM crossings AS c2
WHERE c2.continent = c.continent
AND c2.measure = c.measure
AND c2.date < c.date),
0) AS Average
FROM crossings AS c
ORDER BY date DESC, continent, measure DESC;
两个版本都将受益于 (continent, measure, date)
上的索引。
我正在尝试在 SQLite 中创建累积移动平均线。
回顾一下,在累积移动平均线 (CMA) 中,数据以有序数据流的形式到达,我想获取当前数据点之前所有数据的平均值。
我的 Table 看起来像:
Continent,Date,Measure,Value
Antarctica,03/01/2019 12:00:00 AM,Passengers,346158
South America,03/01/2019 12:00:00 AM,Ships,6483
South America,03/01/2019 12:00:00 AM,Flights,19
Antarctica,02/01/2019 12:00:00 AM,Passengers,172163
South America,02/01/2019 12:00:00 AM,Cargo Ships,1319
Antarctica,01/01/2019 12:00:00 AM,Passengers,56810
以前的解决方案,例如
我试过这样做:
SELECT T1.Date, AVG(T2.VALUE) from my_table AS T1 INNER JOIN my_table AS T2 ON datetime(T1.Date, '-1 Month') <= datetime(T2.Date)
AND datetime(T1.Date, '+1 Month') >= datetime(T2.Date) GROUP BY
T1.date;
但是当我使用sqlite
时,日期时间操作产生错误:sqlite does not have operation datetime.
我什至尝试了简单的命令:SELECT AVG(VALUE) FROM my_table GROUP BY MEASURE, DATE, CONTINENT
,但是按移动平均线分组,这并没有解决我的问题。
我想做什么:
Continent,Date,Measure,Value,Average
Antarctica,03/01/2019 12:00:00 AM,Passengers,346158,114487
South America,03/01/2019 12:00:00 AM,Ships,6483,0
South America,03/01/2019 12:00:00 AM,Flights,19,0
Antarctica,02/01/2019 12:00:00 AM,Passengers,172163,56810
South America,02/01/2019 12:00:00 AM,Cargo Ships,1319,0
Antarctica,01/01/2019 12:00:00 AM,Passengers,56810,0
Average
列是 运行 到欧洲大陆的总过境次数的月平均值以及之前所有月份的过境方式。因此,要计算第一行的平均值(即 运行 3 月之前所有月份穿越南极洲的总乘客的月平均值),您需要计算 [=] 中穿越南极洲的乘客总数的平均总和。 =17=] 和 January 56,810
,并将其四舍五入为最接近的整数 round(228,973/2) = 114,487
。
有没有更简单的方法来解决这个问题?
首先,修复您的时间戳,使它们采用可以排序的格式,例如 sqlite date and time functions 支持的 ISO-8601 格式。不要使用 03/01/2019 12:00:00 AM
,而是使用 2019-03-01 00:00:00
(或者如果您不关心时间,只关心日期,则只使用 2019-03-01
)。这使您的 CSV 数据看起来像:
Continent,Date,Measure,Value
Antarctica,2019-03-01 00:00:00,Passengers,346158
South America,2019-03-01 00:00:00,Ships,6483
South America,2019-03-01 00:00:00,Flights,19
Antarctica,2019-02-01 00:00:00,Passengers,172163
South America,2019-02-01 00:00:00,Cargo Ships,1319
Antarctica,2019-01-01 00:00:00,Passengers,56810
然后你可以使用一个window函数(在Sqlite 3.25中引入)轻松计算前几个月的累计平均值:
SELECT continent, date, measure, value,
cast(round(ifnull(avg(value)
OVER (PARTITION BY continent, measure
ORDER BY date
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING),
0),
0) AS INTEGER) AS Average
FROM crossings
ORDER BY date DESC, continent, measure DESC;
这给出了
Continent Date Measure Value Average
---------- ------------------- ---------- ---------- ----------
Antarctica 2019-03-01 00:00:00 Passengers 346158 114487
South Amer 2019-03-01 00:00:00 Ships 6483 0
South Amer 2019-03-01 00:00:00 Flights 19 0
Antarctica 2019-02-01 00:00:00 Passengers 172163 56810
South Amer 2019-02-01 00:00:00 Cargo Ship 1319 0
Antarctica 2019-01-01 00:00:00 Passengers 56810 0
如果卡在没有window函数支持的旧版本上,您可以使用相关子查询来计算累积平均值:
SELECT continent, date, measure, value,
ifnull((SELECT cast(round(avg(c2.value), 0) AS INTEGER)
FROM crossings AS c2
WHERE c2.continent = c.continent
AND c2.measure = c.measure
AND c2.date < c.date),
0) AS Average
FROM crossings AS c
ORDER BY date DESC, continent, measure DESC;
两个版本都将受益于 (continent, measure, date)
上的索引。