由于嵌套视图被视为禁忌 - 我还应该如何构建一个极其冗长的查询?
Since nested views are seen as taboo - how else should I go about constructing an extremely verbose query?
背景:一名网络开发人员在大学时没有认真对待 SQL,现在正在为一家使用 Snowflake 作为数据仓库计算统计数据的金融公司工作而后悔。
我们有 3 个源表用于所有计算:
- 职位:
create or replace TABLE POS (
ACCOUNT_NUMBER VARCHAR(15) NOT NULL,
ACCOUNT_TYPE VARCHAR(30),
SECURITY_TYPE VARCHAR(30) NOT NULL,
SYMBOL VARCHAR(30) NOT NULL,
QUANTITY NUMBER(15,4),
AMOUNT NUMBER(15,4),
FILE_DATE DATE NOT NULL,
primary key (ACCOUNT_NUMBER, SYMBOL, FILE_DATE)
);
- 交易:
create or replace TABLE TRN (
REP_CODE VARCHAR(10),
FILE_DATE DATE NOT NULL,
ACCOUNT_NUMBER VARCHAR(15) NOT NULL,
CODE VARCHAR(10),
CANCEL_STATUS_FLAG VARCHAR(1),
SYMBOL VARCHAR(100),
SECURITY_CODE VARCHAR(2),
TRADE_DATE DATE,
QUANTITY NUMBER(15,4),
NET_AMOUNT NUMBER(15,4),
PRINCIPAL NUMBER(15,4),
BROKER_FEES NUMBER(15,4),
OTHER_FEES NUMBER(15,4),
SETTLE_DATE DATE,
FROM_TO_ACCOUNT VARCHAR(30),
ACCOUNT_TYPE VARCHAR(30),
ACCRUED_INTEREST NUMBER(15,4),
CLOSING_ACCOUNT_METHOD VARCHAR(30),
DESCRIPTION VARCHAR(500)
);
- 价格:
create or replace TABLE PRI (
SYMBOL VARCHAR(100) NOT NULL,
SECURITY_TYPE VARCHAR(2) NOT NULL,
FILE_DATE DATE NOT NULL,
PRICE NUMBER(15,4) NOT NULL,
FACTOR NUMBER(15,10),
primary key (SYMBOL, FILE_DATE)
);
这些表格本身实际上都是无用且混乱的,它们几乎总是需要相互连接(或它们自身),并且需要对它们应用许多额外的计算才能以任何有意义的方式进行解释。观点帮助我总结了这个问题。
我在这些表的下游使用了两个核心视图:
- 控股
SELECT
POS.FILE_DATE,
POS.ACCOUNT_NUMBER,
POS.SYMBOL,
CASE WHEN POS.QUANTITY > 0 THEN POS.QUANTITY ELSE POS.AMOUNT END AS QUANTITY,
CASE WHEN POS.SECURITY_TYPE IN ('FI', 'MB', 'UI') THEN
COALESCE(
PRI.FACTOR * PRI.PRICE * .01,
PRI.PRICE * .01
)
ELSE PRI.PRICE END AS PPU,
COALESCE(
POS.AMOUNT,
QUANTITY * PPU
) AS MARKET_VALUE
FROM POS AS POS
LEFT JOIN PRI AS PRI
ON POS.FILE_DATE = PRI.FILE_DATE AND POS.SYMBOL = PRI.SYMBOL;
- Cashflows(这个 a 太棒了......我们的数据提供商在这里真的帮不上什么忙)
select t.file_date, T.ACCOUNT_NUMBER,
COALESCE (
CASE WHEN T.SECURITY_CODE = 'MB' THEN INIT * p.factor * .01 ELSE NULL END, -- IF Factor and Par needed
CASE WHEN T.SECURITY_CODE IN ('FI', 'UI') THEN INIT * .01 ELSE NULL END, -- if par val needed
CASE WHEN T.QUANTITY > 0 AND P.PRICE > 0 THEN t.quantity * p.price ELSE NULL END,
CASE WHEN T.NET_AMOUNT > 0 and p.price is not null THEN T.NET_AMOUNT * p.price ELSE NULL END,
T.NET_AMOUNT, -- if the transaction has a net value
BUYS.NET_AMOUNT, -- if there is a buy aggregate match for the day
SELLS.NET_AMOUNT -- if there is a sell aggregate match for the day
) AS DERIVED, -- this records the initial cash flow value
COALESCE(
CASE WHEN t.code IN ('DEP', 'REC') THEN DERIVED ELSE NULL END,
CASE WHEN t.code IN ('WITH', 'DEL', 'FRTAX', 'EXABP') THEN -1 * DERIVED ELSE NULL END
) as DIRECTION, -- this determines if it was an inflow or outflow
CASE
WHEN T.CANCEL_STATUS_FLAG = 'Y' THEN -1*DIRECTION
ELSE DIRECTION
END AS FLOW, -- this cancels out an existing transaction
CASE WHEN T.CODE = 'MFEE' THEN INIT ELSE NULL END AS FEES,
t.code,
t.symbol,
t.net_amount,
t.quantity,
p.price,
p.factor
from trn t
LEFT JOIN PRI p
ON t.symbol = p.symbol
AND t.file_date = p.file_date
-- in the rare case that we dont have a securities price, it means that a buy/sell
-- transaction occurred to remove the position from our
-- data feed. This must mean that the transaction value
-- is equivalent to the total internal operation that occurred to a particular security in
-- this account on this day.
LEFT JOIN (
select file_date,
account_number,
symbol,
SUM(net_amount) as net_amount
from TRN
where code = 'BUY'
group by file_date, account_number, symbol
) AS buys
ON t.code = 'DEL'
AND buys.file_date = t.file_date
AND buys.symbol = t.symbol
AND buys.account_number = t.account_number
AND p.price IS NULL
AND t.net_amount = 0
AND buys.net_amount != 0
LEFT JOIN (
select file_date,
account_number,
symbol,
SUM(net_amount) as net_amount
from TRN
where code = 'SELL'
group by file_date, account_number, symbol
) AS sells
ON t.code = 'REC'
AND t.file_date = sells.file_date
AND sells.symbol = t.symbol
AND sells.account_number = t.account_number
AND p.price IS NULL
AND t.net_amount = 0
AND sells.net_amount != 0
WHERE
t.code in ('DEP', 'WITH', 'DEL', 'REC', 'FRTAX', 'MFEE', 'EXABP')
ORDER BY t.file_date;
我还写了views,把上面两个views按账号分组,命名为account_balances和grouped_cashflows,分别。我经常从我的应用层调用这两个视图,并且对目前的执行速度感到满意。
所有这些都让开....
我现在正在尝试计算每个投资账户的时间加权表现。我更喜欢使用 SQL 而不是在应用层中执行此操作,以便我可以在那些 sweet sweet Snowflake 仪表板中查看输出。
我使用的公式称为 TWRR。
总而言之,它需要我收集所有历史余额+所有现金流量,计算每组连续收盘价之间的净差,并将其记录为百分比。如果我们随后将此百分比 + 1 表示为“因素”,并将给定时间范围内所有这些因素的乘积减去 1,我们就得到了该时间范围内的表现。
所以...我的第一次尝试,我完全按照您的预期做了 - 创建了另一个名为 factors 的视图,它引用了我的其他视图:
SELECT
B.FILE_DATE,
B.ACCOUNT_NUMBER,
B.MARKET_VALUE AS EMV,
COALESCE(CF.FLOW, 0) AS NET,
COALESCE(CF.FEES, 0) AS FEES,
COALESCE(NET + FEES, NET, 0) AS GRS,
LAG(B.MARKET_VALUE, 1, NULL) OVER (PARTITION BY B.ACCOUNT_NUMBER ORDER BY B.FILE_DATE) AS LAST_BAL,
COALESCE(
LAST_BAL,
B.MARKET_VALUE - NET,
B.MARKET_VALUE
) AS BMV,
EMV - BMV AS DIFF,
DIFF - NET AS NET_DIFF,
DIFF - GRS AS GRS_DIFF,
CASE WHEN BMV > 10 AND EMV > 10 AND NET_DIFF / BMV < 1 AND GRS != 0 THEN 1 + (NET_DIFF / BMV) ELSE 1 END AS NET_FACTOR,
CASE WHEN BMV > 10 AND EMV > 10 AND GRS_DIFF / BMV < 1 AND GRS != 0 THEN 1 + (GRS_DIFF / BMV) ELSE 1 END AS GRS_FACTOR
FROM ACCOUNT_BALANCES B
LEFT JOIN GROUPED_CASHFLOWS CF
ON B.FILE_DATE = CF.FILE_DATE
AND B.ACCOUNT_NUMBER = CF.ACCOUNT_NUMBER
order by ACCOUNT_NUMBER, FILE_DATE;
这个查询有效,但是,如您所料,它 真的...真的...慢. 就像,某些帐户需要 10 秒(诚然,使用 xs 雪花实例,但仍然如此。)
在这一点上,很明显我做错了什么,果然,快速 google 搜索非常清楚,像这样嵌套视图是一个巨大的禁忌。
但问题是...将所有这些写成单个查询而不使用我的视图似乎...太可怕了。
所以对你们所有的 SQL/Snowflake 大师们...有没有更好的方法来做到这一点?
如有任何建议,我们将不胜感激。
编辑:包括因子视图的雪花查询配置文件:
谢谢!
到目前为止,我只看到很小的东西,我认为不会堆积成任何大东西。
来自馆藏:
CASE WHEN POS.SECURITY_TYPE IN ('FI', 'MB', 'UI') THEN
COALESCE(
PRI.FACTOR * PRI.PRICE * .01,
PRI.PRICE * .01
)
ELSE PRI.PRICE END AS PPU,
snowflake 中的两条腿 CASE 与使用 IFF 相同,并且 IFF 更容易阅读,恕我直言。并且可以调整数学。
IFF(POS.SECURITY_TYPE IN ('FI', 'MB', 'UI'),
PRI.PRICE * .01 * COALESCE(PRI.FACTOR, 1),
PRI.PRICE) AS PPU,
is cashflow,derived 的大 COALESCE,可以变成 CASE 语句,但也许不会更快:
因此:
COALESCE (
IFF( T.SECURITY_CODE = 'MB', INIT * p.factor * .01, NULL), -- IF Factor and Par needed
IFF( T.SECURITY_CODE IN ('FI', 'UI'), INIT * .01, NULL), -- if par val needed
IFF( T.QUANTITY > 0 AND P.PRICE > 0, t.quantity * p.price, NULL),
IFF( T.NET_AMOUNT > 0 and p.price is not null, T.NET_AMOUNT * p.price, NULL),
T.NET_AMOUNT, -- if the transaction has a net value
BUYS.NET_AMOUNT, -- if there is a buy aggregate match for the day
SELLS.NET_AMOUNT -- if there is a sell aggregate match for the day
) AS DERIVED, -- this records the initial cash flow value
可能是
CASE
WHEN T.SECURITY_CODE = 'MB' THEN INIT * p.factor * .01
WHEN T.SECURITY_CODE IN ('FI', 'UI') THEN INIT * .01
WHEN T.QUANTITY > 0 AND P.PRICE > 0 THEN t.quantity * p.price
WHEN T.NET_AMOUNT > 0 and p.price is not null THEN T.NET_AMOUNT * p.price
ELSE COALESCE(
T.NET_AMOUNT, -- if the transaction has a net value
BUYS.NET_AMOUNT, -- if there is a buy aggregate match for the day
SELLS.NET_AMOUNT -- if there is a sell aggregate match for the day
)
END AS DERIVED, -- this records the initial cash flow value
嗯,这可能有点问题。
在现金流中,您创建 buys
和 sells
并且您只离开加入这些聚合,如果 t.net_amount = 0
但仅在以下情况下使用这些值:
ELSE COALESCE(
T.NET_AMOUNT, -- if the transaction has a net value
BUYS.NET_AMOUNT, -- if there is a buy aggregate match for the day
SELLS.NET_AMOUNT -- if there is a sell aggregate match for the day
)
COALESCE 将仅在 t.net_amount
为空时使用这些值。但是这些值只有在 t.net_amount
为零时才会出现,因此 buys
和 sells
是 100% 的计算浪费。所以以太连接应该是 t.net_amount is null
或者那些可以被删除。
然后就是
CASE WHEN T.CODE = 'MFEE' THEN INIT ELSE NULL END AS FEES
以后如果为空则合并为零(这也可能处理左连接)。但这里可能只是零。但它也指出 T.CODE
可以等于 'MFEE' 并且 DIRECTION 不处理这个,所以方向可以为空,因此 FLOW 可以为空。
背景:一名网络开发人员在大学时没有认真对待 SQL,现在正在为一家使用 Snowflake 作为数据仓库计算统计数据的金融公司工作而后悔。
我们有 3 个源表用于所有计算:
- 职位:
create or replace TABLE POS (
ACCOUNT_NUMBER VARCHAR(15) NOT NULL,
ACCOUNT_TYPE VARCHAR(30),
SECURITY_TYPE VARCHAR(30) NOT NULL,
SYMBOL VARCHAR(30) NOT NULL,
QUANTITY NUMBER(15,4),
AMOUNT NUMBER(15,4),
FILE_DATE DATE NOT NULL,
primary key (ACCOUNT_NUMBER, SYMBOL, FILE_DATE)
);
- 交易:
create or replace TABLE TRN (
REP_CODE VARCHAR(10),
FILE_DATE DATE NOT NULL,
ACCOUNT_NUMBER VARCHAR(15) NOT NULL,
CODE VARCHAR(10),
CANCEL_STATUS_FLAG VARCHAR(1),
SYMBOL VARCHAR(100),
SECURITY_CODE VARCHAR(2),
TRADE_DATE DATE,
QUANTITY NUMBER(15,4),
NET_AMOUNT NUMBER(15,4),
PRINCIPAL NUMBER(15,4),
BROKER_FEES NUMBER(15,4),
OTHER_FEES NUMBER(15,4),
SETTLE_DATE DATE,
FROM_TO_ACCOUNT VARCHAR(30),
ACCOUNT_TYPE VARCHAR(30),
ACCRUED_INTEREST NUMBER(15,4),
CLOSING_ACCOUNT_METHOD VARCHAR(30),
DESCRIPTION VARCHAR(500)
);
- 价格:
create or replace TABLE PRI (
SYMBOL VARCHAR(100) NOT NULL,
SECURITY_TYPE VARCHAR(2) NOT NULL,
FILE_DATE DATE NOT NULL,
PRICE NUMBER(15,4) NOT NULL,
FACTOR NUMBER(15,10),
primary key (SYMBOL, FILE_DATE)
);
这些表格本身实际上都是无用且混乱的,它们几乎总是需要相互连接(或它们自身),并且需要对它们应用许多额外的计算才能以任何有意义的方式进行解释。观点帮助我总结了这个问题。
我在这些表的下游使用了两个核心视图:
- 控股
SELECT
POS.FILE_DATE,
POS.ACCOUNT_NUMBER,
POS.SYMBOL,
CASE WHEN POS.QUANTITY > 0 THEN POS.QUANTITY ELSE POS.AMOUNT END AS QUANTITY,
CASE WHEN POS.SECURITY_TYPE IN ('FI', 'MB', 'UI') THEN
COALESCE(
PRI.FACTOR * PRI.PRICE * .01,
PRI.PRICE * .01
)
ELSE PRI.PRICE END AS PPU,
COALESCE(
POS.AMOUNT,
QUANTITY * PPU
) AS MARKET_VALUE
FROM POS AS POS
LEFT JOIN PRI AS PRI
ON POS.FILE_DATE = PRI.FILE_DATE AND POS.SYMBOL = PRI.SYMBOL;
- Cashflows(这个 a 太棒了......我们的数据提供商在这里真的帮不上什么忙)
select t.file_date, T.ACCOUNT_NUMBER,
COALESCE (
CASE WHEN T.SECURITY_CODE = 'MB' THEN INIT * p.factor * .01 ELSE NULL END, -- IF Factor and Par needed
CASE WHEN T.SECURITY_CODE IN ('FI', 'UI') THEN INIT * .01 ELSE NULL END, -- if par val needed
CASE WHEN T.QUANTITY > 0 AND P.PRICE > 0 THEN t.quantity * p.price ELSE NULL END,
CASE WHEN T.NET_AMOUNT > 0 and p.price is not null THEN T.NET_AMOUNT * p.price ELSE NULL END,
T.NET_AMOUNT, -- if the transaction has a net value
BUYS.NET_AMOUNT, -- if there is a buy aggregate match for the day
SELLS.NET_AMOUNT -- if there is a sell aggregate match for the day
) AS DERIVED, -- this records the initial cash flow value
COALESCE(
CASE WHEN t.code IN ('DEP', 'REC') THEN DERIVED ELSE NULL END,
CASE WHEN t.code IN ('WITH', 'DEL', 'FRTAX', 'EXABP') THEN -1 * DERIVED ELSE NULL END
) as DIRECTION, -- this determines if it was an inflow or outflow
CASE
WHEN T.CANCEL_STATUS_FLAG = 'Y' THEN -1*DIRECTION
ELSE DIRECTION
END AS FLOW, -- this cancels out an existing transaction
CASE WHEN T.CODE = 'MFEE' THEN INIT ELSE NULL END AS FEES,
t.code,
t.symbol,
t.net_amount,
t.quantity,
p.price,
p.factor
from trn t
LEFT JOIN PRI p
ON t.symbol = p.symbol
AND t.file_date = p.file_date
-- in the rare case that we dont have a securities price, it means that a buy/sell
-- transaction occurred to remove the position from our
-- data feed. This must mean that the transaction value
-- is equivalent to the total internal operation that occurred to a particular security in
-- this account on this day.
LEFT JOIN (
select file_date,
account_number,
symbol,
SUM(net_amount) as net_amount
from TRN
where code = 'BUY'
group by file_date, account_number, symbol
) AS buys
ON t.code = 'DEL'
AND buys.file_date = t.file_date
AND buys.symbol = t.symbol
AND buys.account_number = t.account_number
AND p.price IS NULL
AND t.net_amount = 0
AND buys.net_amount != 0
LEFT JOIN (
select file_date,
account_number,
symbol,
SUM(net_amount) as net_amount
from TRN
where code = 'SELL'
group by file_date, account_number, symbol
) AS sells
ON t.code = 'REC'
AND t.file_date = sells.file_date
AND sells.symbol = t.symbol
AND sells.account_number = t.account_number
AND p.price IS NULL
AND t.net_amount = 0
AND sells.net_amount != 0
WHERE
t.code in ('DEP', 'WITH', 'DEL', 'REC', 'FRTAX', 'MFEE', 'EXABP')
ORDER BY t.file_date;
我还写了views,把上面两个views按账号分组,命名为account_balances和grouped_cashflows,分别。我经常从我的应用层调用这两个视图,并且对目前的执行速度感到满意。
所有这些都让开....
我现在正在尝试计算每个投资账户的时间加权表现。我更喜欢使用 SQL 而不是在应用层中执行此操作,以便我可以在那些 sweet sweet Snowflake 仪表板中查看输出。
我使用的公式称为 TWRR。
总而言之,它需要我收集所有历史余额+所有现金流量,计算每组连续收盘价之间的净差,并将其记录为百分比。如果我们随后将此百分比 + 1 表示为“因素”,并将给定时间范围内所有这些因素的乘积减去 1,我们就得到了该时间范围内的表现。
所以...我的第一次尝试,我完全按照您的预期做了 - 创建了另一个名为 factors 的视图,它引用了我的其他视图:
SELECT
B.FILE_DATE,
B.ACCOUNT_NUMBER,
B.MARKET_VALUE AS EMV,
COALESCE(CF.FLOW, 0) AS NET,
COALESCE(CF.FEES, 0) AS FEES,
COALESCE(NET + FEES, NET, 0) AS GRS,
LAG(B.MARKET_VALUE, 1, NULL) OVER (PARTITION BY B.ACCOUNT_NUMBER ORDER BY B.FILE_DATE) AS LAST_BAL,
COALESCE(
LAST_BAL,
B.MARKET_VALUE - NET,
B.MARKET_VALUE
) AS BMV,
EMV - BMV AS DIFF,
DIFF - NET AS NET_DIFF,
DIFF - GRS AS GRS_DIFF,
CASE WHEN BMV > 10 AND EMV > 10 AND NET_DIFF / BMV < 1 AND GRS != 0 THEN 1 + (NET_DIFF / BMV) ELSE 1 END AS NET_FACTOR,
CASE WHEN BMV > 10 AND EMV > 10 AND GRS_DIFF / BMV < 1 AND GRS != 0 THEN 1 + (GRS_DIFF / BMV) ELSE 1 END AS GRS_FACTOR
FROM ACCOUNT_BALANCES B
LEFT JOIN GROUPED_CASHFLOWS CF
ON B.FILE_DATE = CF.FILE_DATE
AND B.ACCOUNT_NUMBER = CF.ACCOUNT_NUMBER
order by ACCOUNT_NUMBER, FILE_DATE;
这个查询有效,但是,如您所料,它 真的...真的...慢. 就像,某些帐户需要 10 秒(诚然,使用 xs 雪花实例,但仍然如此。)
在这一点上,很明显我做错了什么,果然,快速 google 搜索非常清楚,像这样嵌套视图是一个巨大的禁忌。
但问题是...将所有这些写成单个查询而不使用我的视图似乎...太可怕了。
所以对你们所有的 SQL/Snowflake 大师们...有没有更好的方法来做到这一点?
如有任何建议,我们将不胜感激。
编辑:包括因子视图的雪花查询配置文件:
谢谢!
到目前为止,我只看到很小的东西,我认为不会堆积成任何大东西。
来自馆藏:
CASE WHEN POS.SECURITY_TYPE IN ('FI', 'MB', 'UI') THEN
COALESCE(
PRI.FACTOR * PRI.PRICE * .01,
PRI.PRICE * .01
)
ELSE PRI.PRICE END AS PPU,
snowflake 中的两条腿 CASE 与使用 IFF 相同,并且 IFF 更容易阅读,恕我直言。并且可以调整数学。
IFF(POS.SECURITY_TYPE IN ('FI', 'MB', 'UI'),
PRI.PRICE * .01 * COALESCE(PRI.FACTOR, 1),
PRI.PRICE) AS PPU,
is cashflow,derived 的大 COALESCE,可以变成 CASE 语句,但也许不会更快:
因此:
COALESCE (
IFF( T.SECURITY_CODE = 'MB', INIT * p.factor * .01, NULL), -- IF Factor and Par needed
IFF( T.SECURITY_CODE IN ('FI', 'UI'), INIT * .01, NULL), -- if par val needed
IFF( T.QUANTITY > 0 AND P.PRICE > 0, t.quantity * p.price, NULL),
IFF( T.NET_AMOUNT > 0 and p.price is not null, T.NET_AMOUNT * p.price, NULL),
T.NET_AMOUNT, -- if the transaction has a net value
BUYS.NET_AMOUNT, -- if there is a buy aggregate match for the day
SELLS.NET_AMOUNT -- if there is a sell aggregate match for the day
) AS DERIVED, -- this records the initial cash flow value
可能是
CASE
WHEN T.SECURITY_CODE = 'MB' THEN INIT * p.factor * .01
WHEN T.SECURITY_CODE IN ('FI', 'UI') THEN INIT * .01
WHEN T.QUANTITY > 0 AND P.PRICE > 0 THEN t.quantity * p.price
WHEN T.NET_AMOUNT > 0 and p.price is not null THEN T.NET_AMOUNT * p.price
ELSE COALESCE(
T.NET_AMOUNT, -- if the transaction has a net value
BUYS.NET_AMOUNT, -- if there is a buy aggregate match for the day
SELLS.NET_AMOUNT -- if there is a sell aggregate match for the day
)
END AS DERIVED, -- this records the initial cash flow value
嗯,这可能有点问题。
在现金流中,您创建 buys
和 sells
并且您只离开加入这些聚合,如果 t.net_amount = 0
但仅在以下情况下使用这些值:
ELSE COALESCE(
T.NET_AMOUNT, -- if the transaction has a net value
BUYS.NET_AMOUNT, -- if there is a buy aggregate match for the day
SELLS.NET_AMOUNT -- if there is a sell aggregate match for the day
)
COALESCE 将仅在 t.net_amount
为空时使用这些值。但是这些值只有在 t.net_amount
为零时才会出现,因此 buys
和 sells
是 100% 的计算浪费。所以以太连接应该是 t.net_amount is null
或者那些可以被删除。
然后就是
CASE WHEN T.CODE = 'MFEE' THEN INIT ELSE NULL END AS FEES
以后如果为空则合并为零(这也可能处理左连接)。但这里可能只是零。但它也指出 T.CODE
可以等于 'MFEE' 并且 DIRECTION 不处理这个,所以方向可以为空,因此 FLOW 可以为空。