列向量和派生位向量的加权和 - 版本 2
Weighted sum of a column vector and a derived bit vector - Version 2
我们有 table 两个买家的出价和尺寸。出价 p 尺寸 s 表示买家愿意以价格购买 s 数量的产品p。我们有一个 table,其中包含几列(如时间戳、有效性标志)以及以下四列:
- 两个买家的出价,pA和pB.
- 出价大小,sA 和 sB。
我们的工作是向 table 添加一个新的最佳尺码列 (bS),即 returns 最佳价格的尺码。如果两个买家的价格相同那么 bS 等于 sA + sB,否则,我们需要取买家的出价大小提供更高的价格。
下面是一个具有所需输出的示例 table(忽略既不是价格也不是尺寸的列)。
问题的简单解决方法:
SELECT *,
CASE
WHEN pA = pB THEN sA + sB
WHEN pA > pB THEN sA
ELSE sB
END AS bS
FROM t
现在让我们将问题概括为四个买家。标准 SQL 解决方案是
WITH t_ext AS (
SELECT *, GREATEST(pA, pB, pC, pD) as bP
FROM `t`
)
SELECT *, (sA * CAST(pA = bP AS INT64) +
sB * CAST(pB = bP AS INT64) +
sC * CAST(pC = bP AS INT64) +
sD * CAST(pD = bP AS INT64))
AS bS FROM t_ext
问题:
是否有一个简化的查询
- 使用函数 SUM 而不是手动添加四项
- 避免重复投射?
请注意,我们无法通过索引识别价格和尺寸列,只能通过名称。否则,我们可以使用
中提出的解决方案
顺便说一句。我写了一篇关于这个问题的文章 blog post,重点关注 Python 和 Q 中的解决方案,我想知道标准 sql 中的最佳解决方案是什么样的。
以下适用于 BigQuery 标准 SQL
Note that we cannot identify the price and size columns by indices but only by name
#standardSQL
WITH t_ext AS (
SELECT * EXCEPT(arr),
ARRAY(SELECT CAST(val AS INT64) FROM UNNEST(arr) val WITH OFFSET WHERE OFFSET < ARRAY_LENGTH(arr) / 2) AS prices,
ARRAY(SELECT CAST(val AS INT64) FROM UNNEST(arr) val WITH OFFSET WHERE OFFSET >= ARRAY_LENGTH(arr) / 2) AS sizes,
(SELECT MAX(CAST(val AS INT64)) FROM UNNEST(arr) val WITH OFFSET WHERE OFFSET < ARRAY_LENGTH(arr) / 2) AS bestPrice
FROM (
SELECT *, REGEXP_EXTRACT_ALL(TO_JSON_STRING(T), r'(?:"(?:pA|pB|pC|pD|sA|sB|sC|sD)"):(\d+)') AS arr
FROM `project.dataset.table` t
)
)
SELECT * EXCEPT(prices, sizes),
(SELECT SUM(size)
FROM UNNEST(prices) price WITH OFFSET
JOIN UNNEST(sizes) size WITH OFFSET
USING(OFFSET)
WHERE price = bestPrice
) AS bS
FROM t_ext
如您所见 - 您唯一应该提供的是价格和尺寸列名列表,如下例所示
pA|pB|pC|pD|sA|sB|sC|sD
如果应用到虚拟数据如下
#standardSQL
WITH `project.dataset.table` AS (
SELECT 'a' id, 1 pA, 2 pB, 3 pC, 4 pD, 'x' extra_col1, 1 sA, 1 sB, 1 sC, 5 sD UNION ALL
SELECT 'b', 1, 4, 2, 4, 'y', 1, 6, 1, 5 UNION ALL
SELECT 'c', 5, 4, 2, 1, 'z', 7, 1, 1, 1
), t_ext AS (
SELECT * EXCEPT(arr),
ARRAY(SELECT CAST(val AS INT64) FROM UNNEST(arr) val WITH OFFSET WHERE OFFSET < ARRAY_LENGTH(arr) / 2) AS prices,
ARRAY(SELECT CAST(val AS INT64) FROM UNNEST(arr) val WITH OFFSET WHERE OFFSET >= ARRAY_LENGTH(arr) / 2) AS sizes,
(SELECT MAX(CAST(val AS INT64)) FROM UNNEST(arr) val WITH OFFSET WHERE OFFSET < ARRAY_LENGTH(arr) / 2) AS bestPrice
FROM (
SELECT *, REGEXP_EXTRACT_ALL(TO_JSON_STRING(T), r'(?:"(?:pA|pB|pC|pD|sA|sB|sC|sD)"):(\d+)') AS arr
FROM `project.dataset.table` t
)
)
SELECT * EXCEPT(prices, sizes),
(SELECT SUM(size)
FROM UNNEST(prices) price WITH OFFSET
JOIN UNNEST(sizes) size WITH OFFSET
USING(OFFSET)
WHERE price = bestPrice
) AS bS
FROM t_ext
结果是
Row id pA pB pC pD extra_col1 sA sB sC sD bestPrice bS
1 a 1 2 3 4 x 1 1 1 5 4 5
2 b 1 4 2 4 y 1 6 1 5 4 11
3 c 5 4 2 1 z 7 1 1 1 5 7
希望,这就是您要找的
我们有 table 两个买家的出价和尺寸。出价 p 尺寸 s 表示买家愿意以价格购买 s 数量的产品p。我们有一个 table,其中包含几列(如时间戳、有效性标志)以及以下四列:
- 两个买家的出价,pA和pB.
- 出价大小,sA 和 sB。
我们的工作是向 table 添加一个新的最佳尺码列 (bS),即 returns 最佳价格的尺码。如果两个买家的价格相同那么 bS 等于 sA + sB,否则,我们需要取买家的出价大小提供更高的价格。
下面是一个具有所需输出的示例 table(忽略既不是价格也不是尺寸的列)。
问题的简单解决方法:
SELECT *,
CASE
WHEN pA = pB THEN sA + sB
WHEN pA > pB THEN sA
ELSE sB
END AS bS
FROM t
现在让我们将问题概括为四个买家。标准 SQL 解决方案是
WITH t_ext AS (
SELECT *, GREATEST(pA, pB, pC, pD) as bP
FROM `t`
)
SELECT *, (sA * CAST(pA = bP AS INT64) +
sB * CAST(pB = bP AS INT64) +
sC * CAST(pC = bP AS INT64) +
sD * CAST(pD = bP AS INT64))
AS bS FROM t_ext
问题:
是否有一个简化的查询
- 使用函数 SUM 而不是手动添加四项
- 避免重复投射?
请注意,我们无法通过索引识别价格和尺寸列,只能通过名称。否则,我们可以使用
中提出的解决方案顺便说一句。我写了一篇关于这个问题的文章 blog post,重点关注 Python 和 Q 中的解决方案,我想知道标准 sql 中的最佳解决方案是什么样的。
以下适用于 BigQuery 标准 SQL
Note that we cannot identify the price and size columns by indices but only by name
#standardSQL
WITH t_ext AS (
SELECT * EXCEPT(arr),
ARRAY(SELECT CAST(val AS INT64) FROM UNNEST(arr) val WITH OFFSET WHERE OFFSET < ARRAY_LENGTH(arr) / 2) AS prices,
ARRAY(SELECT CAST(val AS INT64) FROM UNNEST(arr) val WITH OFFSET WHERE OFFSET >= ARRAY_LENGTH(arr) / 2) AS sizes,
(SELECT MAX(CAST(val AS INT64)) FROM UNNEST(arr) val WITH OFFSET WHERE OFFSET < ARRAY_LENGTH(arr) / 2) AS bestPrice
FROM (
SELECT *, REGEXP_EXTRACT_ALL(TO_JSON_STRING(T), r'(?:"(?:pA|pB|pC|pD|sA|sB|sC|sD)"):(\d+)') AS arr
FROM `project.dataset.table` t
)
)
SELECT * EXCEPT(prices, sizes),
(SELECT SUM(size)
FROM UNNEST(prices) price WITH OFFSET
JOIN UNNEST(sizes) size WITH OFFSET
USING(OFFSET)
WHERE price = bestPrice
) AS bS
FROM t_ext
如您所见 - 您唯一应该提供的是价格和尺寸列名列表,如下例所示
pA|pB|pC|pD|sA|sB|sC|sD
如果应用到虚拟数据如下
#standardSQL
WITH `project.dataset.table` AS (
SELECT 'a' id, 1 pA, 2 pB, 3 pC, 4 pD, 'x' extra_col1, 1 sA, 1 sB, 1 sC, 5 sD UNION ALL
SELECT 'b', 1, 4, 2, 4, 'y', 1, 6, 1, 5 UNION ALL
SELECT 'c', 5, 4, 2, 1, 'z', 7, 1, 1, 1
), t_ext AS (
SELECT * EXCEPT(arr),
ARRAY(SELECT CAST(val AS INT64) FROM UNNEST(arr) val WITH OFFSET WHERE OFFSET < ARRAY_LENGTH(arr) / 2) AS prices,
ARRAY(SELECT CAST(val AS INT64) FROM UNNEST(arr) val WITH OFFSET WHERE OFFSET >= ARRAY_LENGTH(arr) / 2) AS sizes,
(SELECT MAX(CAST(val AS INT64)) FROM UNNEST(arr) val WITH OFFSET WHERE OFFSET < ARRAY_LENGTH(arr) / 2) AS bestPrice
FROM (
SELECT *, REGEXP_EXTRACT_ALL(TO_JSON_STRING(T), r'(?:"(?:pA|pB|pC|pD|sA|sB|sC|sD)"):(\d+)') AS arr
FROM `project.dataset.table` t
)
)
SELECT * EXCEPT(prices, sizes),
(SELECT SUM(size)
FROM UNNEST(prices) price WITH OFFSET
JOIN UNNEST(sizes) size WITH OFFSET
USING(OFFSET)
WHERE price = bestPrice
) AS bS
FROM t_ext
结果是
Row id pA pB pC pD extra_col1 sA sB sC sD bestPrice bS
1 a 1 2 3 4 x 1 1 1 5 4 5
2 b 1 4 2 4 y 1 6 1 5 4 11
3 c 5 4 2 1 z 7 1 1 1 5 7
希望,这就是您要找的