根据PostgreSQL中的时间间隔计算每天的加权平均值和id
calculate weighted average for each day and id based on time intervals in PostgreSQL
我在 PostgreSQL 数据库中有一个 table,如下所示:
stid | e5 | e10 | diesel | date
-----+------+------+--------+------------------------
e850 | 1300 | 1400 | 1500 | 2016-05-02 05:30:01+02
e850 | 1400 | 1500 | 1700 | 2016-05-02 08:30:01+02
e850 | 1300 | 1400 | 1500 | 2016-05-02 21:00:01+02
e850 | 1200 | 1300 | 1350 | 2016-05-03 10:30:01+02
e850 | 1300 | 1400 | 1500 | 2016-05-03 21:00:01+02
954d | 1200 | 1100 | 1300 | 2016-05-02 03:30:01+02
954d | 1300 | 1100 | 1300 | 2016-05-02 15:00:01+02
954d | 1400 | 1800 | 1400 | 2016-05-02 22:30:01+02
954d | 1700 | 1900 | 1400 | 2016-05-03 09:30:01+02
954d | 1500 | 1900 | 1200 | 2016-05-03 23:30:01+02
所以我有唯一的 id (stid)、价格 (e5、e10、diesel) 和一个时间戳 (date),它指示引入价格的时间。现在我想计算每天的平均价格和 stid,按收取价格的持续时间加权。而且我只想考虑早上8点到晚上8点之间的时间段。
要计算 stid e850 和日期 2016-05-02 上午 8 点到晚上 8 点之间的 e5 的加权平均价格,我将执行以下操作:
(1300 * 1801 + 1400 * 41399) / 43200 = 1395.83102
1300 is the price that was set at 5:30:01 am and 1801 is the duration in
seconds between 8 am and 8:30:01 am.
1400 is the price that was set at 8:30:01 am and 41399 is the duration in
seconds between 8:30:01 am and 8 pm.
最后我想要一个看起来像这样的 table:
stid | date | average_e5 | average_e10 | average_diesel
-----+------------+------------+-------------+---------------
e850 | 2016-05-02 | 1395.83102 | 1495.83102 | 1691.66204
e850 | 2016-05-03 | 1220.83565 | 1320.83565 | 1381.25347
954d | 2016-05-02 | 1241.66435 | 1100 | 1300
954d | 2016-05-03 | 1662.49306 | 1887.49769 | 1400
编辑:解决方案
下面来自 Vao Tsun 回答的代码几乎可以满足我的所有需求。但是,当一天上午 8 点之前或晚上 8 点之后没有价格和 id 时,我无法获得我正在寻找的加权平均值。但是通过为上午 8 点之前或晚上 8 点之后没有价格的情况创建虚拟条目,我能够解决这个问题。
我使用以下代码创建了一个名为 mytable2
的新 table,其中包含虚拟条目。
DROP TABLE IF EXISTS mytable2;
CREATE TABLE mytable2 AS SELECT * FROM mytable;
WITH c AS (
SELECT
*,
LAG(date) OVER(PARTITION BY stid ORDER BY date) AS lag_date,
LAG(e5) OVER(PARTITION BY stid ORDER BY date) AS lag_e5,
LAG(e10) OVER(PARTITION BY stid ORDER BY date) AS lag_e10,
LAG(diesel) OVER(PARTITION BY stid ORDER BY date) AS lag_diesel
FROM mytable
)
INSERT INTO mytable2
SELECT
stid,
lag_e5 AS e5,
lag_e10 AS e10,
lag_diesel AS diesel,
date_trunc('day', date) + '0 hours'::interval AS date
FROM c WHERE lag_date < date_trunc('day', date) + '0 hours'::interval
AND date > date_trunc('day', date) + '8 hours'::interval;
WITH d AS (
SELECT
*,
LEAD(date) OVER(PARTITION BY stid ORDER BY date) AS lead_date
FROM mytable
)
INSERT INTO mytable2
SELECT
stid,
e5,
e10,
diesel,
date_trunc('day', date) + '23 hours'::interval AS date
FROM d WHERE lead_date >= date_trunc('day', date) + '24 hours'::interval
AND date < date_trunc('day', date) + '20 hours'::interval;
然后我可以 运行 Vao Tsun 的答案中的代码来获得所需的加权平均值。我只将 mytable
更改为 mytable2
,以将 table 与添加的虚拟条目一起使用。
with a as (
select *
, case
when date < date_trunc('day', date) + '8 hours'::interval then date_trunc('day', date) + '8 hours'::interval
when date > date_trunc('day', date) + '20 hours'::interval then date_trunc('day', date) + '20 hours'::interval
else date
end d
, date_trunc('day', date) dt
from mytable2
)
, b as (
select stid, e5, e10, diesel,date,d, dt
, extract(epoch from lead(d) over (partition by stid,dt order by stid,d) - d) diff
from a
)
select DISTINCT
stid, dt,sum(e5*diff*1.0) over (partition by stid,dt)/sum(diff) over (partition by stid,dt) e5_weight_avg
from b
order by stid desc, dt;
stid | dt | e5_weight_avg
-----+---------------------+-----------------
e850 | 2016-05-02 00:00:00 | 1395.83101851852
e850 | 2016-05-03 00:00:00 | 1220.83564814815
954d | 2016-05-02 00:00:00 | 1241.66435185185
954d | 2016-05-03 00:00:00 | 1662.49305555556
代码也可以在这里找到rextester
我做了一些不需要的 CTE,以使其更具可读性:
t=# with a as (
select *
, case
when date < date_trunc('day', date) + '8 hours'::interval then date_trunc('day', date) + '8 hours'::interval
when date > date_trunc('day', date) + '20 hours'::interval then date_trunc('day', date) + '20 hours'::interval
else date
end d
, date_trunc('day', date) dt
from mytable
)
, b as (
select stid, e5, e10, diesel,date,d, dt
, extract(epoch from lead(d) over (partition by stid,dt order by stid,d) - d) diff
from a
)
select
stid, e5,date,d, diff,sum(e5*diff*1.0) over (partition by stid,dt)/sum(diff) over (partition by stid,dt) e5_weight_avg
from b
order by stid desc, date;
stid | e5 | date | d | diff | e5_weight_avg
------+---------+---------------------+---------------------+-------+------------------
e850 | 1300.00 | 2016-05-02 05:30:01 | 2016-05-02 08:00:00 | 1801 | 1395.83101851852
e850 | 1400.00 | 2016-05-02 08:30:01 | 2016-05-02 08:30:01 | 41399 | 1395.83101851852
e850 | 1300.00 | 2016-05-02 21:00:01 | 2016-05-02 20:00:00 | | 1395.83101851852
e850 | 1200.00 | 2016-05-03 10:30:01 | 2016-05-03 10:30:01 | 34199 | 1200
e850 | 1300.00 | 2016-05-03 21:00:01 | 2016-05-03 20:00:00 | | 1200
954d | 1200.00 | 2016-05-02 03:30:01 | 2016-05-02 08:00:00 | 25201 | 1241.66435185185
954d | 1300.00 | 2016-05-02 15:00:01 | 2016-05-02 15:00:01 | 17999 | 1241.66435185185
954d | 1400.00 | 2016-05-02 22:30:01 | 2016-05-02 20:00:00 | | 1241.66435185185
954d | 1700.00 | 2016-05-03 09:30:01 | 2016-05-03 09:30:01 | 37799 | 1700
954d | 1500.00 | 2016-05-03 23:30:01 | 2016-05-03 20:00:00 | | 1700
(10 rows)
因此,跳过中间步骤:
t=# with a as (
select *
, case
when date < date_trunc('day', date) + '8 hours'::interval then date_trunc('day', date) + '8 hours'::interval
when date > date_trunc('day', date) + '20 hours'::interval then date_trunc('day', date) + '20 hours'::interval
else date
end d
, date_trunc('day', date) dt
from mytable
)
, b as (
select stid, e5, e10, diesel,date,d, dt
, extract(epoch from lead(d) over (partition by stid,dt order by stid,d) - d) diff
from a
)
select DISTINCT
stid, dt,sum(e5*diff*1.0) over (partition by stid,dt)/sum(diff) over (partition by stid,dt) e5_weight_avg
from b
order by stid desc, dt;
stid | dt | e5_weight_avg
------+---------------------+------------------
e850 | 2016-05-02 00:00:00 | 1395.83101851852
e850 | 2016-05-03 00:00:00 | 1200
954d | 2016-05-02 00:00:00 | 1241.66435185185
954d | 2016-05-03 00:00:00 | 1700
(4 rows)
我在 PostgreSQL 数据库中有一个 table,如下所示:
stid | e5 | e10 | diesel | date
-----+------+------+--------+------------------------
e850 | 1300 | 1400 | 1500 | 2016-05-02 05:30:01+02
e850 | 1400 | 1500 | 1700 | 2016-05-02 08:30:01+02
e850 | 1300 | 1400 | 1500 | 2016-05-02 21:00:01+02
e850 | 1200 | 1300 | 1350 | 2016-05-03 10:30:01+02
e850 | 1300 | 1400 | 1500 | 2016-05-03 21:00:01+02
954d | 1200 | 1100 | 1300 | 2016-05-02 03:30:01+02
954d | 1300 | 1100 | 1300 | 2016-05-02 15:00:01+02
954d | 1400 | 1800 | 1400 | 2016-05-02 22:30:01+02
954d | 1700 | 1900 | 1400 | 2016-05-03 09:30:01+02
954d | 1500 | 1900 | 1200 | 2016-05-03 23:30:01+02
所以我有唯一的 id (stid)、价格 (e5、e10、diesel) 和一个时间戳 (date),它指示引入价格的时间。现在我想计算每天的平均价格和 stid,按收取价格的持续时间加权。而且我只想考虑早上8点到晚上8点之间的时间段。
要计算 stid e850 和日期 2016-05-02 上午 8 点到晚上 8 点之间的 e5 的加权平均价格,我将执行以下操作:
(1300 * 1801 + 1400 * 41399) / 43200 = 1395.83102
1300 is the price that was set at 5:30:01 am and 1801 is the duration in
seconds between 8 am and 8:30:01 am.
1400 is the price that was set at 8:30:01 am and 41399 is the duration in
seconds between 8:30:01 am and 8 pm.
最后我想要一个看起来像这样的 table:
stid | date | average_e5 | average_e10 | average_diesel
-----+------------+------------+-------------+---------------
e850 | 2016-05-02 | 1395.83102 | 1495.83102 | 1691.66204
e850 | 2016-05-03 | 1220.83565 | 1320.83565 | 1381.25347
954d | 2016-05-02 | 1241.66435 | 1100 | 1300
954d | 2016-05-03 | 1662.49306 | 1887.49769 | 1400
编辑:解决方案
下面来自 Vao Tsun 回答的代码几乎可以满足我的所有需求。但是,当一天上午 8 点之前或晚上 8 点之后没有价格和 id 时,我无法获得我正在寻找的加权平均值。但是通过为上午 8 点之前或晚上 8 点之后没有价格的情况创建虚拟条目,我能够解决这个问题。
我使用以下代码创建了一个名为 mytable2
的新 table,其中包含虚拟条目。
DROP TABLE IF EXISTS mytable2;
CREATE TABLE mytable2 AS SELECT * FROM mytable;
WITH c AS (
SELECT
*,
LAG(date) OVER(PARTITION BY stid ORDER BY date) AS lag_date,
LAG(e5) OVER(PARTITION BY stid ORDER BY date) AS lag_e5,
LAG(e10) OVER(PARTITION BY stid ORDER BY date) AS lag_e10,
LAG(diesel) OVER(PARTITION BY stid ORDER BY date) AS lag_diesel
FROM mytable
)
INSERT INTO mytable2
SELECT
stid,
lag_e5 AS e5,
lag_e10 AS e10,
lag_diesel AS diesel,
date_trunc('day', date) + '0 hours'::interval AS date
FROM c WHERE lag_date < date_trunc('day', date) + '0 hours'::interval
AND date > date_trunc('day', date) + '8 hours'::interval;
WITH d AS (
SELECT
*,
LEAD(date) OVER(PARTITION BY stid ORDER BY date) AS lead_date
FROM mytable
)
INSERT INTO mytable2
SELECT
stid,
e5,
e10,
diesel,
date_trunc('day', date) + '23 hours'::interval AS date
FROM d WHERE lead_date >= date_trunc('day', date) + '24 hours'::interval
AND date < date_trunc('day', date) + '20 hours'::interval;
然后我可以 运行 Vao Tsun 的答案中的代码来获得所需的加权平均值。我只将 mytable
更改为 mytable2
,以将 table 与添加的虚拟条目一起使用。
with a as (
select *
, case
when date < date_trunc('day', date) + '8 hours'::interval then date_trunc('day', date) + '8 hours'::interval
when date > date_trunc('day', date) + '20 hours'::interval then date_trunc('day', date) + '20 hours'::interval
else date
end d
, date_trunc('day', date) dt
from mytable2
)
, b as (
select stid, e5, e10, diesel,date,d, dt
, extract(epoch from lead(d) over (partition by stid,dt order by stid,d) - d) diff
from a
)
select DISTINCT
stid, dt,sum(e5*diff*1.0) over (partition by stid,dt)/sum(diff) over (partition by stid,dt) e5_weight_avg
from b
order by stid desc, dt;
stid | dt | e5_weight_avg
-----+---------------------+-----------------
e850 | 2016-05-02 00:00:00 | 1395.83101851852
e850 | 2016-05-03 00:00:00 | 1220.83564814815
954d | 2016-05-02 00:00:00 | 1241.66435185185
954d | 2016-05-03 00:00:00 | 1662.49305555556
代码也可以在这里找到rextester
我做了一些不需要的 CTE,以使其更具可读性:
t=# with a as (
select *
, case
when date < date_trunc('day', date) + '8 hours'::interval then date_trunc('day', date) + '8 hours'::interval
when date > date_trunc('day', date) + '20 hours'::interval then date_trunc('day', date) + '20 hours'::interval
else date
end d
, date_trunc('day', date) dt
from mytable
)
, b as (
select stid, e5, e10, diesel,date,d, dt
, extract(epoch from lead(d) over (partition by stid,dt order by stid,d) - d) diff
from a
)
select
stid, e5,date,d, diff,sum(e5*diff*1.0) over (partition by stid,dt)/sum(diff) over (partition by stid,dt) e5_weight_avg
from b
order by stid desc, date;
stid | e5 | date | d | diff | e5_weight_avg
------+---------+---------------------+---------------------+-------+------------------
e850 | 1300.00 | 2016-05-02 05:30:01 | 2016-05-02 08:00:00 | 1801 | 1395.83101851852
e850 | 1400.00 | 2016-05-02 08:30:01 | 2016-05-02 08:30:01 | 41399 | 1395.83101851852
e850 | 1300.00 | 2016-05-02 21:00:01 | 2016-05-02 20:00:00 | | 1395.83101851852
e850 | 1200.00 | 2016-05-03 10:30:01 | 2016-05-03 10:30:01 | 34199 | 1200
e850 | 1300.00 | 2016-05-03 21:00:01 | 2016-05-03 20:00:00 | | 1200
954d | 1200.00 | 2016-05-02 03:30:01 | 2016-05-02 08:00:00 | 25201 | 1241.66435185185
954d | 1300.00 | 2016-05-02 15:00:01 | 2016-05-02 15:00:01 | 17999 | 1241.66435185185
954d | 1400.00 | 2016-05-02 22:30:01 | 2016-05-02 20:00:00 | | 1241.66435185185
954d | 1700.00 | 2016-05-03 09:30:01 | 2016-05-03 09:30:01 | 37799 | 1700
954d | 1500.00 | 2016-05-03 23:30:01 | 2016-05-03 20:00:00 | | 1700
(10 rows)
因此,跳过中间步骤:
t=# with a as (
select *
, case
when date < date_trunc('day', date) + '8 hours'::interval then date_trunc('day', date) + '8 hours'::interval
when date > date_trunc('day', date) + '20 hours'::interval then date_trunc('day', date) + '20 hours'::interval
else date
end d
, date_trunc('day', date) dt
from mytable
)
, b as (
select stid, e5, e10, diesel,date,d, dt
, extract(epoch from lead(d) over (partition by stid,dt order by stid,d) - d) diff
from a
)
select DISTINCT
stid, dt,sum(e5*diff*1.0) over (partition by stid,dt)/sum(diff) over (partition by stid,dt) e5_weight_avg
from b
order by stid desc, dt;
stid | dt | e5_weight_avg
------+---------------------+------------------
e850 | 2016-05-02 00:00:00 | 1395.83101851852
e850 | 2016-05-03 00:00:00 | 1200
954d | 2016-05-02 00:00:00 | 1241.66435185185
954d | 2016-05-03 00:00:00 | 1700
(4 rows)