在 SQLite 上创建累积移动平均线

Creating cumulative moving average on SQLite

我正在尝试在 SQLite 中创建累积移动平均线。

回顾一下,在累积移动平均线 (CMA) 中,数据以有序数据流的形式到达,我想获取当前数据点之前所有数据的平均值。

我的 Table 看起来像:

Continent,Date,Measure,Value
Antarctica,03/01/2019 12:00:00 AM,Passengers,346158
South America,03/01/2019 12:00:00 AM,Ships,6483
South America,03/01/2019 12:00:00 AM,Flights,19
Antarctica,02/01/2019 12:00:00 AM,Passengers,172163
South America,02/01/2019 12:00:00 AM,Cargo Ships,1319
Antarctica,01/01/2019 12:00:00 AM,Passengers,56810

以前的解决方案,例如 or 2 描述了每月或每周的移动平均线。然而,虽然我可以每月保持这个平均值,但我正在尝试建立一个累积平均值。

我试过这样做:

SELECT T1.Date, AVG(T2.VALUE) from my_table AS T1 INNER JOIN my_table AS T2 ON datetime(T1.Date, '-1 Month') <= datetime(T2.Date) 
  AND datetime(T1.Date, '+1 Month') >= datetime(T2.Date) GROUP BY
  T1.date;

但是当我使用sqlite时,日期时间操作产生错误:sqlite does not have operation datetime.

我什至尝试了简单的命令:SELECT AVG(VALUE) FROM my_table GROUP BY MEASURE, DATE, CONTINENT,但是按移动平均线分组,这并没有解决我的问题。

我想做什么:

Continent,Date,Measure,Value,Average
Antarctica,03/01/2019 12:00:00 AM,Passengers,346158,114487
South America,03/01/2019 12:00:00 AM,Ships,6483,0
South America,03/01/2019 12:00:00 AM,Flights,19,0
Antarctica,02/01/2019 12:00:00 AM,Passengers,172163,56810
South America,02/01/2019 12:00:00 AM,Cargo Ships,1319,0
Antarctica,01/01/2019 12:00:00 AM,Passengers,56810,0

Average 列是 运行 到欧洲大陆的总过境次数的月平均值以及之前所有月份的过境方式。因此,要计算第一行的平均值(即 运行 3 月之前所有月份穿越南极洲的总乘客的月平均值),您需要计算 [=] 中穿越南极洲的乘客总数的平均总和。 =17=] 和 January 56,810,并将其四舍五入为最接近的整数 round(228,973/2) = 114,487

有没有更简单的方法来解决这个问题?

首先,修复您的时间戳,使它们采用可以排序的格式,例如 sqlite date and time functions 支持的 ISO-8601 格式。不要使用 03/01/2019 12:00:00 AM,而是使用 2019-03-01 00:00:00(或者如果您不关心时间,只关心日期,则只使用 2019-03-01)。这使您的 CSV 数据看起来像:

Continent,Date,Measure,Value
Antarctica,2019-03-01 00:00:00,Passengers,346158
South America,2019-03-01 00:00:00,Ships,6483
South America,2019-03-01 00:00:00,Flights,19
Antarctica,2019-02-01 00:00:00,Passengers,172163
South America,2019-02-01 00:00:00,Cargo Ships,1319
Antarctica,2019-01-01 00:00:00,Passengers,56810

然后你可以使用一个window函数(在Sqlite 3.25中引入)轻松计算前几个月的累计平均值:

SELECT continent, date, measure, value,
       cast(round(ifnull(avg(value)
                          OVER (PARTITION BY continent, measure
                                ORDER BY date
                                ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING),
                         0),
                  0) AS INTEGER) AS Average
FROM crossings
ORDER BY date DESC, continent, measure DESC;

这给出了

Continent   Date                 Measure     Value       Average   
----------  -------------------  ----------  ----------  ----------
Antarctica  2019-03-01 00:00:00  Passengers  346158      114487 
South Amer  2019-03-01 00:00:00  Ships       6483        0    
South Amer  2019-03-01 00:00:00  Flights     19          0   
Antarctica  2019-02-01 00:00:00  Passengers  172163      56810   
South Amer  2019-02-01 00:00:00  Cargo Ship  1319        0     
Antarctica  2019-01-01 00:00:00  Passengers  56810       0     

如果卡在没有window函数支持的旧版本上,您可以使用相关子查询来计算累积平均值:

SELECT continent, date, measure, value,
       ifnull((SELECT cast(round(avg(c2.value), 0) AS INTEGER)
               FROM crossings AS c2
               WHERE c2.continent = c.continent
                 AND c2.measure = c.measure
                 AND c2.date < c.date),
              0) AS Average
FROM crossings AS c
ORDER BY date DESC, continent, measure DESC;

两个版本都将受益于 (continent, measure, date) 上的索引。