带有联接的查询的 BigQuery 内部错误 select
BigQuery internal error on query with a join select
我目前正在做与此处非常相似的事情:
MySQL - Subtracting value from previous row, group by
我当前的查询是:
SELECT a.x, a.y, a.z, COALESCE(a.z - b.z,0) AS diff
FROM [bla] AS a
LEFT JOIN EACH
[bla] AS b
ON b.x=a.x
AND b.y = (SELECT MAX(y) FROM [bla] WHERE x = a.x AND y < a.y)
但是,我遇到了以下错误:
Error: An internal error occurred and the request could not be completed.
这个错误并没有多大帮助,我不知道这里出了什么问题。问题似乎是 SELECT 子查询的最终 ON 语句。
我不知道你的内部错误的具体原因,但请注意,BigQuery 中的连接条件必须是相等的连接(例如,a.x = b.x AND a.y = b.y
)。您不能在连接条件中放置常量、不等式或子查询。
此外,我不鼓励在 BigQuery 中使用自联接,因为它们通常会导致性能问题。看起来您正在尝试为任何给定的 x 找到最大的 y?如果是这样,您也许可以改用解析函数(例如,MAX(y) OVER(PARTITION BY x)
)?
使用上述 link - MySQL - Subtracting value from previous row, group by 中的数据:
BigQuery 的解决方案很简单,如下语句
SELECT SN, Date, COALESCE(ROUND(Value - NextValue, 2), 0) as consumption
FROM (
SELECT *, LAG(Value, 1) OVER (PARTITION BY SN ORDER BY Date) as NextValue
FROM temp.EnergyLog)
ORDER BY SN, Date
现在,下面是尝试用你的[bla]写它 table:
SELECT x, y, z, COALESCE(ROUND(z - Nextz, 2), 0) as diff
FROM (
SELECT *, LAG(z, 1) OVER (PARTITION BY x ORDER BY y) as Nextz
FROM temp.bla)
ORDER BY x, y
我认为上面的方法很有可能奏效 - 但您可能需要做一些额外的调整
另一种解决方案是基于最近引入的 JS UDF。
它看起来可能比我上面已经提出的更重,但我也喜欢它,因为它提供了 great/fine 对分析逻辑的控制。
我怀疑这将是您的实际选择,但从概念上讲这很有用
因此,例如 MySQL - Subtracting value from previous row, group by 的解决方案是
SELECT SN, Date, ROUND(consumption,2) as consumption FROM
js( // input table
(SELECT SN, NEST(STRING(Date) + ',' + STRING(Value)) as Metric
FROM temp.EnergyLog GROUP BY SN) ,
// input columns
SN, Metric,
// output schema
"[{name: 'SN', type: 'integer'},
{name: 'Date', type: 'string'},
{name: 'consumption', type: 'float'}]",
// function
"function(r, emit){
pair = r.Metric.sort(function (a,b) {return a > b;});
val = pair[0].split(','); Date = val[0];
emit({SN: r.SN, Date: Date, consumption: 0});
for (var i=0; i<pair.length -1; i +=1){
val = pair[i].split(','); Date = val[0]; Value1 = val[1];
val = pair[i+1].split(','); Value2 = val[1];
emit({SN: r.SN, Date: Date, consumption: Value2 - Value1});
}
}"
) ORDER BY SN, Date
您可以在此处查看 UDF 文档:https://cloud.google.com/bigquery/user-defined-functions
输出将与之前建议的使用 LAG 的解决方案完全相同
希望您能够 "translate" 以上代码与 [bla] table
我目前正在做与此处非常相似的事情: MySQL - Subtracting value from previous row, group by
我当前的查询是:
SELECT a.x, a.y, a.z, COALESCE(a.z - b.z,0) AS diff
FROM [bla] AS a
LEFT JOIN EACH
[bla] AS b
ON b.x=a.x
AND b.y = (SELECT MAX(y) FROM [bla] WHERE x = a.x AND y < a.y)
但是,我遇到了以下错误:
Error: An internal error occurred and the request could not be completed.
这个错误并没有多大帮助,我不知道这里出了什么问题。问题似乎是 SELECT 子查询的最终 ON 语句。
我不知道你的内部错误的具体原因,但请注意,BigQuery 中的连接条件必须是相等的连接(例如,a.x = b.x AND a.y = b.y
)。您不能在连接条件中放置常量、不等式或子查询。
此外,我不鼓励在 BigQuery 中使用自联接,因为它们通常会导致性能问题。看起来您正在尝试为任何给定的 x 找到最大的 y?如果是这样,您也许可以改用解析函数(例如,MAX(y) OVER(PARTITION BY x)
)?
使用上述 link - MySQL - Subtracting value from previous row, group by 中的数据: BigQuery 的解决方案很简单,如下语句
SELECT SN, Date, COALESCE(ROUND(Value - NextValue, 2), 0) as consumption
FROM (
SELECT *, LAG(Value, 1) OVER (PARTITION BY SN ORDER BY Date) as NextValue
FROM temp.EnergyLog)
ORDER BY SN, Date
现在,下面是尝试用你的[bla]写它 table:
SELECT x, y, z, COALESCE(ROUND(z - Nextz, 2), 0) as diff
FROM (
SELECT *, LAG(z, 1) OVER (PARTITION BY x ORDER BY y) as Nextz
FROM temp.bla)
ORDER BY x, y
我认为上面的方法很有可能奏效 - 但您可能需要做一些额外的调整
另一种解决方案是基于最近引入的 JS UDF。
它看起来可能比我上面已经提出的更重,但我也喜欢它,因为它提供了 great/fine 对分析逻辑的控制。
我怀疑这将是您的实际选择,但从概念上讲这很有用
因此,例如 MySQL - Subtracting value from previous row, group by 的解决方案是
SELECT SN, Date, ROUND(consumption,2) as consumption FROM
js( // input table
(SELECT SN, NEST(STRING(Date) + ',' + STRING(Value)) as Metric
FROM temp.EnergyLog GROUP BY SN) ,
// input columns
SN, Metric,
// output schema
"[{name: 'SN', type: 'integer'},
{name: 'Date', type: 'string'},
{name: 'consumption', type: 'float'}]",
// function
"function(r, emit){
pair = r.Metric.sort(function (a,b) {return a > b;});
val = pair[0].split(','); Date = val[0];
emit({SN: r.SN, Date: Date, consumption: 0});
for (var i=0; i<pair.length -1; i +=1){
val = pair[i].split(','); Date = val[0]; Value1 = val[1];
val = pair[i+1].split(','); Value2 = val[1];
emit({SN: r.SN, Date: Date, consumption: Value2 - Value1});
}
}"
) ORDER BY SN, Date
您可以在此处查看 UDF 文档:https://cloud.google.com/bigquery/user-defined-functions
输出将与之前建议的使用 LAG 的解决方案完全相同
希望您能够 "translate" 以上代码与 [bla] table