SQL Pandas diff() 函数的模拟(第一个离散差分)[LAG 函数]
SQL analogue of Pandas diff() function (1st discrete difference) [LAG function]
我正在寻找一种编写 SQL 查询的方法,该查询将对原始系列应用第一个离散差异。在 Python 中,使用 Pandas 的 .diff()
方法非常简单:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(0,100,size=(10, 2)), columns=list('AB'))
df["diff_A"]=df["A"].diff()
df["diff_B"]=df["B"].diff()
print(df)
我想要的输出显示在 "diff_A"
和 "diff_B"
列中:
A B diff_A diff_B
0 36 14 NaN NaN
1 32 13 -4.0 -1.0
2 31 87 -1.0 74.0
3 58 88 27.0 1.0
4 44 34 -14.0 -54.0
5 2 43 -42.0 9.0
6 15 94 13.0 51.0
7 46 74 31.0 -20.0
8 60 9 14.0 -65.0
9 43 57 -17.0 48.0
我使用 Oracle,但我绝对更喜欢干净的 ANSI 解决方案。
IIUC你可以使用解析LAG函数:
with v as (
select rowid as rn, a, b from tab
)
select
a, b,
a - lag(a, 1) over(order by rn) as diff_a,
b - lag(b, 1) over(order by rn) as diff_b
from v
order by rn;
PS 使用真实的列(如日期)进行排序会好得多,因为 rowid
can be changed。
例如:
select
a, b,
a - lag(a, 1) over(order by inserted) as diff_a,
b - lag(b, 1) over(order by inserted) as diff_b
from tab;
:
Data-sets in SQL are un-ordered. For deterministic results in LAG()
always use a sufficient ORDER BY clause. (If no such field exists, one
should be created when/before the data in inserted in to a SQL data
set. The un-ordered nature of a SQL data set allows massive numbers of
scalability options and optimisation options to be available.)
我发布这个答案只是因为我能够按照已接受答案中的评论在 SQLFiddle 中复制结果。除了 rowid
事后改变之外,是否有有效的论据说明为什么这个更简单的答案不起作用。
select
a, b,
a - lag(a, 1) over(order by rowid) as diff_a,
b - lag(b, 1) over(order by rowid) as diff_b
from tab;
我正在寻找一种编写 SQL 查询的方法,该查询将对原始系列应用第一个离散差异。在 Python 中,使用 Pandas 的 .diff()
方法非常简单:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(0,100,size=(10, 2)), columns=list('AB'))
df["diff_A"]=df["A"].diff()
df["diff_B"]=df["B"].diff()
print(df)
我想要的输出显示在 "diff_A"
和 "diff_B"
列中:
A B diff_A diff_B
0 36 14 NaN NaN
1 32 13 -4.0 -1.0
2 31 87 -1.0 74.0
3 58 88 27.0 1.0
4 44 34 -14.0 -54.0
5 2 43 -42.0 9.0
6 15 94 13.0 51.0
7 46 74 31.0 -20.0
8 60 9 14.0 -65.0
9 43 57 -17.0 48.0
我使用 Oracle,但我绝对更喜欢干净的 ANSI 解决方案。
IIUC你可以使用解析LAG函数:
with v as (
select rowid as rn, a, b from tab
)
select
a, b,
a - lag(a, 1) over(order by rn) as diff_a,
b - lag(b, 1) over(order by rn) as diff_b
from v
order by rn;
PS 使用真实的列(如日期)进行排序会好得多,因为 rowid
can be changed。
例如:
select
a, b,
a - lag(a, 1) over(order by inserted) as diff_a,
b - lag(b, 1) over(order by inserted) as diff_b
from tab;
Data-sets in SQL are un-ordered. For deterministic results in LAG() always use a sufficient ORDER BY clause. (If no such field exists, one should be created when/before the data in inserted in to a SQL data set. The un-ordered nature of a SQL data set allows massive numbers of scalability options and optimisation options to be available.)
我发布这个答案只是因为我能够按照已接受答案中的评论在 SQLFiddle 中复制结果。除了 rowid
事后改变之外,是否有有效的论据说明为什么这个更简单的答案不起作用。
select
a, b,
a - lag(a, 1) over(order by rowid) as diff_a,
b - lag(b, 1) over(order by rowid) as diff_b
from tab;