SQL Pandas diff() 函数的模拟(第一个离散差分)[LAG 函数]

SQL analogue of Pandas diff() function (1st discrete difference) [LAG function]

我正在寻找一种编写 SQL 查询的方法,该查询将对原始系列应用第一个离散差异。在 Python 中,使用 Pandas 的 .diff() 方法非常简单:

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randint(0,100,size=(10, 2)), columns=list('AB'))

df["diff_A"]=df["A"].diff()
df["diff_B"]=df["B"].diff()

print(df)

我想要的输出显示在 "diff_A""diff_B" 列中:

    A   B  diff_A  diff_B
0  36  14     NaN     NaN
1  32  13    -4.0    -1.0
2  31  87    -1.0    74.0
3  58  88    27.0     1.0
4  44  34   -14.0   -54.0
5   2  43   -42.0     9.0
6  15  94    13.0    51.0
7  46  74    31.0   -20.0
8  60   9    14.0   -65.0
9  43  57   -17.0    48.0

我使用 Oracle,但我绝对更喜欢干净的 ANSI 解决方案。

IIUC你可以使用解析LAG函数:

with v as (
  select rowid as rn, a, b from tab
)
select
  a, b,
  a - lag(a, 1) over(order by rn) as diff_a,
  b - lag(b, 1) over(order by rn) as diff_b
from v
order by rn;

PS 使用真实的列(如日期)进行排序会好得多,因为 rowid can be changed

例如:

select
  a, b,
  a - lag(a, 1) over(order by inserted) as diff_a,
  b - lag(b, 1) over(order by inserted) as diff_b
from tab;

:

Data-sets in SQL are un-ordered. For deterministic results in LAG() always use a sufficient ORDER BY clause. (If no such field exists, one should be created when/before the data in inserted in to a SQL data set. The un-ordered nature of a SQL data set allows massive numbers of scalability options and optimisation options to be available.)

SQL Fiddle test

PS Windowing functions were added to the ANSI/ISO Standard SQL:2003 and then extended in ANSI/ISO Standard SQL:2008. Microsoft was late to this game. DB2, Oracle, Sybase, PostgreSQL and other products have had full implementations for years. SQL Server did not catch up until SQL 2012.

我发布这个答案只是因为我能够按照已接受答案中的评论在 SQLFiddle 中复制结果。除了 rowid 事后改变之外,是否有有效的论据说明为什么这个更简单的答案不起作用。

select
  a, b,
  a - lag(a, 1) over(order by rowid) as diff_a,
  b - lag(b, 1) over(order by rowid) as diff_b
from tab;