Return 只有 Snowflake SQL 查询到 R 的最后一条语句
Return only last statement from Snowflake SQL query to R
我有一个 Snowflake SQL 查询,我正尝试通过 ODBC 连接在 R 中执行,如下所示
SET quiet=TRUE;
USE SOMEDATABASE.SOMESCHEMA;
--Select timestamp of last sale per customer
DROP TABLE IF EXISTS sales;
CREATE TEMPORARY TABLE sales(CustomerId VARCHAR(16777216), SaleTS TIMESTAMP_NTZ(9));
INSERT INTO sales
SELECT CustomerId,
SaleTS
FROM SALES
WHERE SaleTS>= '2020-11-19 00:00:00'
AND SaleTS <= '2020-11-19 23:59:59.999'
GROUP BY CustomerId;
--Use temp table to get correct row from sales table
SELECT SUM(SalesDetail.price) as SumPrice
COUNT(*) as SoldVolume
FROM sales
LEFT JOIN SALES as SalesDetail
ON Sales.CustomerId = SalesDetail.CustomerId
AND sales.SaleTS = SalesDetail.SaleTS
从 R 查询 Microsoft SQL 服务器 我通常会在查询的顶部包含 set nocount no;
以确保只将最后一步返回到 R 以避免错误 Actual statement count 6 did not match the desired statement count 1.
错误是有道理的,当 R 期望 1 时 SQL 返回 6 个组件(我的 SQL 查询中的每个步骤 6 个组件)。在 Snowflake 中,似乎没有以相同方式设置 nocount on 的选项。我的问题是如何避免上述错误。有没有人有通过 R 执行多步 Snowflake SQL 查询的经验?我怎样才能让 R 只接收来自 ODBC 连接的最后一条语句。到目前为止,我已经尝试了 set nocount=TRUE;
、set echo=FALSE;
、set message=FALSE;
、 SET quiet=TRUE
等
Snowflake SQL 具有足够的表现力,建议的代码可以构造为单个查询:
WITH cte AS (
SELECT CustomerId, MAX(SaleTS) AS SaleTS -- here agg function is required
FROM SALES
WHERE SaleTS>= '2020-11-19 00:00:00'
AND SaleTS <= '2020-11-19 23:59:59.999'
GROUP BY CustomerId
)
SELECT SUM(SalesDetail.price) as SumPrice
COUNT(*) as SoldVolume
FROM cte
LEFT JOIN SALES as SalesDetail
ON Sales.CustomerId = SalesDetail.CustomerId
AND sales.SaleTS = SalesDetail.SaleTS;
原始查询对 table 和临时 table 使用相同的名称,仅大小写 sales
与 SALES
不同,这很容易出错。
其次:数据库和模式可以在建立连接时设置,所以不需要USE
inside script。或者,可以在脚本中使用完全限定名称。
我猜查询的意图如下:
WITH cte AS (
SELECT *
FROM SOMEDATABASE.SOMESCHEMA.SALES
WHERE SaleTS BETWEEN '2020-11-19 00:00:00' AND '2020-11-19 23:59:59.999'
QUALIFY ROW_NUMBER() OVER(PARTITION BY CustomerId ORDER BY SaleTS DESC) = 1
)
SELECT COUNT(*) AS SoldVolume, SUM(price) as SumPrice
FROM cte;
如果一个人可能有两个完全相同的 SaleTS 条目,则应改用 RANK() OVER(...)
。
我有一个 Snowflake SQL 查询,我正尝试通过 ODBC 连接在 R 中执行,如下所示
SET quiet=TRUE;
USE SOMEDATABASE.SOMESCHEMA;
--Select timestamp of last sale per customer
DROP TABLE IF EXISTS sales;
CREATE TEMPORARY TABLE sales(CustomerId VARCHAR(16777216), SaleTS TIMESTAMP_NTZ(9));
INSERT INTO sales
SELECT CustomerId,
SaleTS
FROM SALES
WHERE SaleTS>= '2020-11-19 00:00:00'
AND SaleTS <= '2020-11-19 23:59:59.999'
GROUP BY CustomerId;
--Use temp table to get correct row from sales table
SELECT SUM(SalesDetail.price) as SumPrice
COUNT(*) as SoldVolume
FROM sales
LEFT JOIN SALES as SalesDetail
ON Sales.CustomerId = SalesDetail.CustomerId
AND sales.SaleTS = SalesDetail.SaleTS
从 R 查询 Microsoft SQL 服务器 我通常会在查询的顶部包含 set nocount no;
以确保只将最后一步返回到 R 以避免错误 Actual statement count 6 did not match the desired statement count 1.
错误是有道理的,当 R 期望 1 时 SQL 返回 6 个组件(我的 SQL 查询中的每个步骤 6 个组件)。在 Snowflake 中,似乎没有以相同方式设置 nocount on 的选项。我的问题是如何避免上述错误。有没有人有通过 R 执行多步 Snowflake SQL 查询的经验?我怎样才能让 R 只接收来自 ODBC 连接的最后一条语句。到目前为止,我已经尝试了 set nocount=TRUE;
、set echo=FALSE;
、set message=FALSE;
、 SET quiet=TRUE
等
Snowflake SQL 具有足够的表现力,建议的代码可以构造为单个查询:
WITH cte AS (
SELECT CustomerId, MAX(SaleTS) AS SaleTS -- here agg function is required
FROM SALES
WHERE SaleTS>= '2020-11-19 00:00:00'
AND SaleTS <= '2020-11-19 23:59:59.999'
GROUP BY CustomerId
)
SELECT SUM(SalesDetail.price) as SumPrice
COUNT(*) as SoldVolume
FROM cte
LEFT JOIN SALES as SalesDetail
ON Sales.CustomerId = SalesDetail.CustomerId
AND sales.SaleTS = SalesDetail.SaleTS;
原始查询对 table 和临时 table 使用相同的名称,仅大小写 sales
与 SALES
不同,这很容易出错。
其次:数据库和模式可以在建立连接时设置,所以不需要USE
inside script。或者,可以在脚本中使用完全限定名称。
我猜查询的意图如下:
WITH cte AS (
SELECT *
FROM SOMEDATABASE.SOMESCHEMA.SALES
WHERE SaleTS BETWEEN '2020-11-19 00:00:00' AND '2020-11-19 23:59:59.999'
QUALIFY ROW_NUMBER() OVER(PARTITION BY CustomerId ORDER BY SaleTS DESC) = 1
)
SELECT COUNT(*) AS SoldVolume, SUM(price) as SumPrice
FROM cte;
如果一个人可能有两个完全相同的 SaleTS 条目,则应改用 RANK() OVER(...)
。