SQL 中多行值之间的差异
Difference between the values of multiple rows in SQL
我在 SQL 中的 table 是这样的:-
RN Name value1 value2 Timestamp
1 Mark 110 210 20160119
1 Mark 106 205 20160115
1 Mark 103 201 20160112
2 Steve 120 220 20151218
2 Steve 111 210 20151210
2 Steve 104 206 20151203
期望的输出:-
RN Name value1Lag1 value1lag2 value2lag1 value2lag2
1 Mark 4 3 5 4
2 Steve 9 7 10 4
对于 RN 1,差异是从最近到第二个最近计算的,然后是从第二个最近到第三个最近计算的
value1lag1 = 110-106 =4
value1lag2 = 106-103 = 3
value2lag1 = 210-205 = 5
value2lag2 = 205-201 = 4
同样适用于其他 RN。
注意:每个 RN 有 3 行且只有 3 行。
我尝试了几种方法,从类似的帖子中寻求帮助,但没有成功。
如果只是这种情况,不是difficult.you需要2步
self join 得到minus
的结果
select t1.RN,
t1.Name,
t1.rm,
t2.value1-t1.value1 as value1,
t2.value2-t1.value2 as value2
from
(select RN,Name,value1,value2,
row_number(partition by Name order by Timestamp desc) as rm from table)t1
left join
(select RN,Name,value1,value2,
row_number(partition by Name order by Timestamp desc) as rm from table) t2
on t1.rm = t2.rm-1
where t2.RN is not null.
你将其设置为 table 比方说 table3.
2.you 旋转它
select * from (
select t3.RN, t3.Name,t3.rm,t3.value1,t3.value2 from table3 t3
)
pivot
(
max(value1)
for rm in ('1','2')
)v1
3.you 获取 2 个主元 table 将值 1 和值 2 连接起来得到结果。
但我认为可能有更好的方法,我不确定我们是否可以在旋转它时加入枢轴,所以我将在获得枢轴结果后使用连接,这将使 2 个 tables .它不好,但我能做的最好
我假设 RN 和 Name 在这里链接。有点乱,但是如果每个 RN 总是有 3 个值,并且你总是想按这个顺序检查它们,那么这样的事情应该可行。
SELECT
t1.Name
, AVG(CASE WHEN table_ranked.Rank = 1 THEN table_ranked.value1 ELSE NULL END) - AVG(CASE WHEN table_ranked.Rank = 2 THEN table_ranked.value1 ELSE NULL END) value1Lag1
, AVG(CASE WHEN table_ranked.Rank = 2 THEN table_ranked.value1 ELSE NULL END) - AVG(CASE WHEN table_ranked.Rank = 3 THEN table_ranked.value1 ELSE NULL END) value1Lag2
, AVG(CASE WHEN table_ranked.Rank = 1 THEN table_ranked.value2 ELSE NULL END) - AVG(CASE WHEN table_ranked.Rank = 2 THEN table_ranked.value2 ELSE NULL END) value2Lag1
, AVG(CASE WHEN table_ranked.Rank = 2 THEN table_ranked.value2 ELSE NULL END) - AVG(CASE WHEN table_ranked.Rank = 3 THEN table_ranked.value2 ELSE NULL END) value2Lag2
FROM table t1
INNER JOIN
(
SELECT
t1.Name
, t1.value1
, t1.value2
, COUNT(t2.TimeStamp) Rank
FROM table t1
INNER JOIN table t2
ON t2.name = t1.name
AND t1.TimeStamp <= t2.TimeStamp
GROUP BY t1.Name, t1.value1, t1.value2
) table_ranked
ON table_ranked.Name = t1.Name
GROUP BY t1.Name
-- test data
with data(rn,
name,
value1,
value2,
timestamp) as
(select 1, 'Mark', 110, 210, to_date('20160119', 'YYYYMMDD')
from dual
union all
select 1, 'Mark', 106, 205, to_date('20160115', 'YYYYMMDD')
from dual
union all
select 1, 'Mark', 103, 201, to_date('20160112', 'YYYYMMDD')
from dual
union all
select 2, 'Steve', 120, 220, to_date('20151218', 'YYYYMMDD')
from dual
union all
select 2, 'Steve', 111, 210, to_date('20151210', 'YYYYMMDD')
from dual
union all
select 2, 'Steve', 104, 206, to_date('20151203', 'YYYYMMDD') from dual),
-- first transform value1, value2 to value_id (1,2), value
data2 as
(select d.rn, d.name, 1 as val_id, d.value1 as value, d.timestamp
from data d
union all
select d.rn, d.name, 2 as val_id, d.value2 as value, d.timestamp
from data d)
select * -- find previous row P of row D, evaluate difference and build column name as desired
from (select d.rn,
d.name,
d.value - p.value as value,
'value' || d.val_id || 'Lag' || row_number() over(partition by d.rn, d.val_id order by d.timestamp desc) as col
from data2 p, data2 d
where p.rn = d.rn
and p.val_id = d.val_id
and p.timestamp =
(select max(pp.timestamp)
from data2 pp
where pp.rn = p.rn
and pp.val_id = p.val_id
and pp.timestamp < d.timestamp))
-- pivot
pivot(sum(value) for col in('value1Lag1',
'value1Lag2',
'value2Lag1',
'value2Lag2'));
这里还有其他答案,但我认为你的问题是 analytic functions, specifically LAG():
select
rn,
name,
-- calculate the differences
value1 - v1l1 value1lag1,
v1l1 - v1l2 value1lag2,
value2 - v2l1 value2lag1,
v2l1 - v2l2 value2lag2
from (
select
rn,
name,
value1,
value2,
timestamp,
-- these two are the values from the row before this one ordered by timestamp (ascending)
lag(value1) over(partition by rn, name order by timestamp asc) v1l1,
lag(value2) over(partition by rn, name order by timestamp asc) v2l1
-- these two are the values from two rows before this one ordered by timestamp (ascending)
lag(value1, 2) over(partition by rn, name order by timestamp asc) v1l2,
lag(value2, 2) over(partition by rn, name order by timestamp asc) v2l2
from (
select
1 rn, 'Mark' name, 110 value1, 210 value2, '20160119' timestamp
from dual
union all
select
1 rn, 'Mark' name, 106 value1, 205 value2, '20160115' timestamp
from dual
union all
select
1 rn, 'Mark' name, 103 value1, 201 value2, '20160112' timestamp
from dual
union all
select
2 rn, 'Steve' name, 120 value1, 220 value2, '20151218' timestamp
from dual
union all
select
2 rn, 'Steve' name, 111 value1, 210 value2, '20151210' timestamp
from dual
union all
select
2 rn, 'Steve' name, 104 value1, 206 value2, '20151203' timestamp
from dual
) data
)
where
-- return only the rows that have defined values
v1l1 is not null and
v1l2 is not null and
v2l1 is not null and
v2l1 is not null
这种方法的好处是 Oracle 会在内部进行所有必要的缓冲,避免自连接等。对于大数据集,从性能的角度来看,这可能很重要。
例如,该查询的解释计划类似于
-------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 6 | 150 | 13 (8)| 00:00:01 |
|* 1 | VIEW | | 6 | 150 | 13 (8)| 00:00:01 |
| 2 | WINDOW SORT | | 6 | 138 | 13 (8)| 00:00:01 |
| 3 | VIEW | | 6 | 138 | 12 (0)| 00:00:01 |
| 4 | UNION-ALL | | | | | |
| 5 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
| 6 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
| 7 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
| 8 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
| 9 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
| 10 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
-------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("V1L1" IS NOT NULL AND "V1L2" IS NOT NULL AND "V2L1" IS
请注意,没有连接,只有 WINDOW SORT 缓冲来自 "data source" 的必要数据(在我们的例子中,VIEW 3 是 [=22 的 UNION ALL =] ... FROM DUAL) 来划分和计算不同的滞后。
我在 SQL 中的 table 是这样的:-
RN Name value1 value2 Timestamp
1 Mark 110 210 20160119
1 Mark 106 205 20160115
1 Mark 103 201 20160112
2 Steve 120 220 20151218
2 Steve 111 210 20151210
2 Steve 104 206 20151203
期望的输出:-
RN Name value1Lag1 value1lag2 value2lag1 value2lag2
1 Mark 4 3 5 4
2 Steve 9 7 10 4
对于 RN 1,差异是从最近到第二个最近计算的,然后是从第二个最近到第三个最近计算的
value1lag1 = 110-106 =4
value1lag2 = 106-103 = 3
value2lag1 = 210-205 = 5
value2lag2 = 205-201 = 4
同样适用于其他 RN。
注意:每个 RN 有 3 行且只有 3 行。
我尝试了几种方法,从类似的帖子中寻求帮助,但没有成功。
如果只是这种情况,不是difficult.you需要2步
self join 得到minus
的结果select t1.RN, t1.Name, t1.rm, t2.value1-t1.value1 as value1, t2.value2-t1.value2 as value2 from (select RN,Name,value1,value2, row_number(partition by Name order by Timestamp desc) as rm from table)t1 left join (select RN,Name,value1,value2, row_number(partition by Name order by Timestamp desc) as rm from table) t2 on t1.rm = t2.rm-1 where t2.RN is not null.
你将其设置为 table 比方说 table3.
2.you 旋转它
select * from (
select t3.RN, t3.Name,t3.rm,t3.value1,t3.value2 from table3 t3
)
pivot
(
max(value1)
for rm in ('1','2')
)v1
3.you 获取 2 个主元 table 将值 1 和值 2 连接起来得到结果。
但我认为可能有更好的方法,我不确定我们是否可以在旋转它时加入枢轴,所以我将在获得枢轴结果后使用连接,这将使 2 个 tables .它不好,但我能做的最好
我假设 RN 和 Name 在这里链接。有点乱,但是如果每个 RN 总是有 3 个值,并且你总是想按这个顺序检查它们,那么这样的事情应该可行。
SELECT
t1.Name
, AVG(CASE WHEN table_ranked.Rank = 1 THEN table_ranked.value1 ELSE NULL END) - AVG(CASE WHEN table_ranked.Rank = 2 THEN table_ranked.value1 ELSE NULL END) value1Lag1
, AVG(CASE WHEN table_ranked.Rank = 2 THEN table_ranked.value1 ELSE NULL END) - AVG(CASE WHEN table_ranked.Rank = 3 THEN table_ranked.value1 ELSE NULL END) value1Lag2
, AVG(CASE WHEN table_ranked.Rank = 1 THEN table_ranked.value2 ELSE NULL END) - AVG(CASE WHEN table_ranked.Rank = 2 THEN table_ranked.value2 ELSE NULL END) value2Lag1
, AVG(CASE WHEN table_ranked.Rank = 2 THEN table_ranked.value2 ELSE NULL END) - AVG(CASE WHEN table_ranked.Rank = 3 THEN table_ranked.value2 ELSE NULL END) value2Lag2
FROM table t1
INNER JOIN
(
SELECT
t1.Name
, t1.value1
, t1.value2
, COUNT(t2.TimeStamp) Rank
FROM table t1
INNER JOIN table t2
ON t2.name = t1.name
AND t1.TimeStamp <= t2.TimeStamp
GROUP BY t1.Name, t1.value1, t1.value2
) table_ranked
ON table_ranked.Name = t1.Name
GROUP BY t1.Name
-- test data
with data(rn,
name,
value1,
value2,
timestamp) as
(select 1, 'Mark', 110, 210, to_date('20160119', 'YYYYMMDD')
from dual
union all
select 1, 'Mark', 106, 205, to_date('20160115', 'YYYYMMDD')
from dual
union all
select 1, 'Mark', 103, 201, to_date('20160112', 'YYYYMMDD')
from dual
union all
select 2, 'Steve', 120, 220, to_date('20151218', 'YYYYMMDD')
from dual
union all
select 2, 'Steve', 111, 210, to_date('20151210', 'YYYYMMDD')
from dual
union all
select 2, 'Steve', 104, 206, to_date('20151203', 'YYYYMMDD') from dual),
-- first transform value1, value2 to value_id (1,2), value
data2 as
(select d.rn, d.name, 1 as val_id, d.value1 as value, d.timestamp
from data d
union all
select d.rn, d.name, 2 as val_id, d.value2 as value, d.timestamp
from data d)
select * -- find previous row P of row D, evaluate difference and build column name as desired
from (select d.rn,
d.name,
d.value - p.value as value,
'value' || d.val_id || 'Lag' || row_number() over(partition by d.rn, d.val_id order by d.timestamp desc) as col
from data2 p, data2 d
where p.rn = d.rn
and p.val_id = d.val_id
and p.timestamp =
(select max(pp.timestamp)
from data2 pp
where pp.rn = p.rn
and pp.val_id = p.val_id
and pp.timestamp < d.timestamp))
-- pivot
pivot(sum(value) for col in('value1Lag1',
'value1Lag2',
'value2Lag1',
'value2Lag2'));
这里还有其他答案,但我认为你的问题是 analytic functions, specifically LAG():
select
rn,
name,
-- calculate the differences
value1 - v1l1 value1lag1,
v1l1 - v1l2 value1lag2,
value2 - v2l1 value2lag1,
v2l1 - v2l2 value2lag2
from (
select
rn,
name,
value1,
value2,
timestamp,
-- these two are the values from the row before this one ordered by timestamp (ascending)
lag(value1) over(partition by rn, name order by timestamp asc) v1l1,
lag(value2) over(partition by rn, name order by timestamp asc) v2l1
-- these two are the values from two rows before this one ordered by timestamp (ascending)
lag(value1, 2) over(partition by rn, name order by timestamp asc) v1l2,
lag(value2, 2) over(partition by rn, name order by timestamp asc) v2l2
from (
select
1 rn, 'Mark' name, 110 value1, 210 value2, '20160119' timestamp
from dual
union all
select
1 rn, 'Mark' name, 106 value1, 205 value2, '20160115' timestamp
from dual
union all
select
1 rn, 'Mark' name, 103 value1, 201 value2, '20160112' timestamp
from dual
union all
select
2 rn, 'Steve' name, 120 value1, 220 value2, '20151218' timestamp
from dual
union all
select
2 rn, 'Steve' name, 111 value1, 210 value2, '20151210' timestamp
from dual
union all
select
2 rn, 'Steve' name, 104 value1, 206 value2, '20151203' timestamp
from dual
) data
)
where
-- return only the rows that have defined values
v1l1 is not null and
v1l2 is not null and
v2l1 is not null and
v2l1 is not null
这种方法的好处是 Oracle 会在内部进行所有必要的缓冲,避免自连接等。对于大数据集,从性能的角度来看,这可能很重要。
例如,该查询的解释计划类似于
-------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 6 | 150 | 13 (8)| 00:00:01 |
|* 1 | VIEW | | 6 | 150 | 13 (8)| 00:00:01 |
| 2 | WINDOW SORT | | 6 | 138 | 13 (8)| 00:00:01 |
| 3 | VIEW | | 6 | 138 | 12 (0)| 00:00:01 |
| 4 | UNION-ALL | | | | | |
| 5 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
| 6 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
| 7 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
| 8 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
| 9 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
| 10 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
-------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("V1L1" IS NOT NULL AND "V1L2" IS NOT NULL AND "V2L1" IS
请注意,没有连接,只有 WINDOW SORT 缓冲来自 "data source" 的必要数据(在我们的例子中,VIEW 3 是 [=22 的 UNION ALL =] ... FROM DUAL) 来划分和计算不同的滞后。