在 Netezza 中按滚动日期间隔分组
Grouping by rolling date interval in Netezza
我在 Netezza 中有一个 table,看起来像这样
Date Stock Return
2015-01-01 A xxx
2015-01-02 A xxx
2015-01-03 A 0
2015-01-04 A 0
2015-01-05 A xxx
2015-01-06 A xxx
2015-01-07 A xxx
2015-01-08 A xxx
2015-01-09 A xxx
2015-01-10 A 0
2015-01-11 A 0
2015-01-12 A xxx
2015-01-13 A xxx
2015-01-14 A xxx
2015-01-15 A xxx
2015-01-16 A xxx
2015-01-17 A 0
2015-01-18 A 0
2015-01-19 A xxx
2015-01-20 A xxx
数据表示各种股票和日期的股票 returns。我需要做的是按给定的时间间隔和该时间间隔的日期对数据进行分组。另一个困难是周末 (0s) 必须打折(忽略 public 假期)。第一个间隔的开始日期应该是任意日期。
例如我的输出应该是这样的
Interval Q01 Q02 Q03 Q04 Q05
1 xxx xxx xxx xxx xxx
2 xxx xxx xxx xxx xxx
3 xxx xxx xxx xxx xxx
4 xxx xxx xxx xxx xxx
此输出将代表 5 个工作日长度的间隔,根据上面的原始数据,平均 returns 作为结果,
开始日期 1 月 1 日,第一个间隔包括 1/2/5/6/7(3 和 4 是周末,被忽略)Q01 是第 1 个,Q02 是第 2 个,Q03 是第 5 个等等。第二个间隔从 8/9 开始/12/13/14.
我尝试使用
没有成功
CEIL(CAST(EXTRACT(DOY FROM DATE) AS FLOAT) / CAST (10 AS FLOAT)) AS interval
EXTRACT(DAY FROM DATE) % 10 AS DAYinInterval
我也试过使用滚动计数器和可变开始日期将我的 DOY 设置为零 s.th 像这样
CEIL(CAST(EXTRACT(DOY FROM DATE) - EXTRACT(DOY FROM 'start-date' AS FLOAT) / CAST (10 AS FLOAT)) AS Interval
最接近我预期的一件事是这个
SUM(Number) OVER(PARTITION BY STOCK ORDER BY DATE ASC rows 10 preceding) AS Counter
不幸的是,它从 1 到 10,然后是 11,它应该再次从 1 到 10。
我很想看看如何以优雅的方式实现它。谢谢
我不完全确定我理解这个问题,但我 认为 我可能会,所以我要用一些窗口聚合和子查询来解决这个问题。
这是示例数据,插入了一些工作日的随机非零数据。
DATE | STOCK | RETURN
------------+-------+--------
2015-01-01 | A | 16
2015-01-02 | A | 80
2015-01-03 | A | 0
2015-01-04 | A | 0
2015-01-05 | A | 60
2015-01-06 | A | 25
2015-01-07 | A | 12
2015-01-08 | A | 1
2015-01-09 | A | 81
2015-01-10 | A | 0
2015-01-11 | A | 0
2015-01-12 | A | 35
2015-01-13 | A | 20
2015-01-14 | A | 69
2015-01-15 | A | 72
2015-01-16 | A | 89
2015-01-17 | A | 0
2015-01-18 | A | 0
2015-01-19 | A | 100
2015-01-20 | A | 67
(20 rows)
这是我的做法,带有嵌入式评论。
select avg(return),
date_period,
day_period
from (
-- use row_number to generate a sequential value for each DOW,
-- with a WHERE to filter out the weekends
select date,
stock,
return,
date_period ,
row_number() over (partition by date_period order by date asc) day_period
from (
-- bin out the entries by date_period using the first_value of the entire set as the starting point
-- modulo 7
select date,
stock,
return,
date + (first_value(date) over (order by date asc) - date) % 7 date_period
from stocks
where date >= '2015-01-01'
-- setting the starting period date here
)
foo
where extract (dow from date) not in (1,7)
)
foo
group by date_period, day_period
order by date_period asc;
结果:
AVG | DATE_PERIOD | DAY_PERIOD
------------+-------------+------------
16.000000 | 2015-01-01 | 1
80.000000 | 2015-01-01 | 2
60.000000 | 2015-01-01 | 3
25.000000 | 2015-01-01 | 4
12.000000 | 2015-01-01 | 5
1.000000 | 2015-01-08 | 1
81.000000 | 2015-01-08 | 2
35.000000 | 2015-01-08 | 3
20.000000 | 2015-01-08 | 4
69.000000 | 2015-01-08 | 5
72.000000 | 2015-01-15 | 1
89.000000 | 2015-01-15 | 2
100.000000 | 2015-01-15 | 3
67.000000 | 2015-01-15 | 4
(14 rows)
将开始日期更改为“2015-01-03”以查看是否调整正确:
...
from stocks
where date >= '2015-01-03'
...
结果:
AVG | DATE_PERIOD | DAY_PERIOD
------------+-------------+------------
60.000000 | 2015-01-03 | 1
25.000000 | 2015-01-03 | 2
12.000000 | 2015-01-03 | 3
1.000000 | 2015-01-03 | 4
81.000000 | 2015-01-03 | 5
35.000000 | 2015-01-10 | 1
20.000000 | 2015-01-10 | 2
69.000000 | 2015-01-10 | 3
72.000000 | 2015-01-10 | 4
89.000000 | 2015-01-10 | 5
100.000000 | 2015-01-17 | 1
67.000000 | 2015-01-17 | 2
(12 rows)
我在 Netezza 中有一个 table,看起来像这样
Date Stock Return
2015-01-01 A xxx
2015-01-02 A xxx
2015-01-03 A 0
2015-01-04 A 0
2015-01-05 A xxx
2015-01-06 A xxx
2015-01-07 A xxx
2015-01-08 A xxx
2015-01-09 A xxx
2015-01-10 A 0
2015-01-11 A 0
2015-01-12 A xxx
2015-01-13 A xxx
2015-01-14 A xxx
2015-01-15 A xxx
2015-01-16 A xxx
2015-01-17 A 0
2015-01-18 A 0
2015-01-19 A xxx
2015-01-20 A xxx
数据表示各种股票和日期的股票 returns。我需要做的是按给定的时间间隔和该时间间隔的日期对数据进行分组。另一个困难是周末 (0s) 必须打折(忽略 public 假期)。第一个间隔的开始日期应该是任意日期。
例如我的输出应该是这样的
Interval Q01 Q02 Q03 Q04 Q05
1 xxx xxx xxx xxx xxx
2 xxx xxx xxx xxx xxx
3 xxx xxx xxx xxx xxx
4 xxx xxx xxx xxx xxx
此输出将代表 5 个工作日长度的间隔,根据上面的原始数据,平均 returns 作为结果, 开始日期 1 月 1 日,第一个间隔包括 1/2/5/6/7(3 和 4 是周末,被忽略)Q01 是第 1 个,Q02 是第 2 个,Q03 是第 5 个等等。第二个间隔从 8/9 开始/12/13/14.
我尝试使用
没有成功CEIL(CAST(EXTRACT(DOY FROM DATE) AS FLOAT) / CAST (10 AS FLOAT)) AS interval
EXTRACT(DAY FROM DATE) % 10 AS DAYinInterval
我也试过使用滚动计数器和可变开始日期将我的 DOY 设置为零 s.th 像这样
CEIL(CAST(EXTRACT(DOY FROM DATE) - EXTRACT(DOY FROM 'start-date' AS FLOAT) / CAST (10 AS FLOAT)) AS Interval
最接近我预期的一件事是这个 SUM(Number) OVER(PARTITION BY STOCK ORDER BY DATE ASC rows 10 preceding) AS Counter
不幸的是,它从 1 到 10,然后是 11,它应该再次从 1 到 10。
我很想看看如何以优雅的方式实现它。谢谢
我不完全确定我理解这个问题,但我 认为 我可能会,所以我要用一些窗口聚合和子查询来解决这个问题。
这是示例数据,插入了一些工作日的随机非零数据。
DATE | STOCK | RETURN
------------+-------+--------
2015-01-01 | A | 16
2015-01-02 | A | 80
2015-01-03 | A | 0
2015-01-04 | A | 0
2015-01-05 | A | 60
2015-01-06 | A | 25
2015-01-07 | A | 12
2015-01-08 | A | 1
2015-01-09 | A | 81
2015-01-10 | A | 0
2015-01-11 | A | 0
2015-01-12 | A | 35
2015-01-13 | A | 20
2015-01-14 | A | 69
2015-01-15 | A | 72
2015-01-16 | A | 89
2015-01-17 | A | 0
2015-01-18 | A | 0
2015-01-19 | A | 100
2015-01-20 | A | 67
(20 rows)
这是我的做法,带有嵌入式评论。
select avg(return),
date_period,
day_period
from (
-- use row_number to generate a sequential value for each DOW,
-- with a WHERE to filter out the weekends
select date,
stock,
return,
date_period ,
row_number() over (partition by date_period order by date asc) day_period
from (
-- bin out the entries by date_period using the first_value of the entire set as the starting point
-- modulo 7
select date,
stock,
return,
date + (first_value(date) over (order by date asc) - date) % 7 date_period
from stocks
where date >= '2015-01-01'
-- setting the starting period date here
)
foo
where extract (dow from date) not in (1,7)
)
foo
group by date_period, day_period
order by date_period asc;
结果:
AVG | DATE_PERIOD | DAY_PERIOD
------------+-------------+------------
16.000000 | 2015-01-01 | 1
80.000000 | 2015-01-01 | 2
60.000000 | 2015-01-01 | 3
25.000000 | 2015-01-01 | 4
12.000000 | 2015-01-01 | 5
1.000000 | 2015-01-08 | 1
81.000000 | 2015-01-08 | 2
35.000000 | 2015-01-08 | 3
20.000000 | 2015-01-08 | 4
69.000000 | 2015-01-08 | 5
72.000000 | 2015-01-15 | 1
89.000000 | 2015-01-15 | 2
100.000000 | 2015-01-15 | 3
67.000000 | 2015-01-15 | 4
(14 rows)
将开始日期更改为“2015-01-03”以查看是否调整正确:
...
from stocks
where date >= '2015-01-03'
...
结果:
AVG | DATE_PERIOD | DAY_PERIOD
------------+-------------+------------
60.000000 | 2015-01-03 | 1
25.000000 | 2015-01-03 | 2
12.000000 | 2015-01-03 | 3
1.000000 | 2015-01-03 | 4
81.000000 | 2015-01-03 | 5
35.000000 | 2015-01-10 | 1
20.000000 | 2015-01-10 | 2
69.000000 | 2015-01-10 | 3
72.000000 | 2015-01-10 | 4
89.000000 | 2015-01-10 | 5
100.000000 | 2015-01-17 | 1
67.000000 | 2015-01-17 | 2
(12 rows)