如何在sql窗口表达式中获取最后一个非空值?
How to get last nonnull value in usql windowing expression?
我正在尝试在窗口表达式中凝胶化最后一个非空值:
LAST_VALUE([b]) OVER (ORDER BY Timestamp ASC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS bf
不幸的是,它不起作用。
我写了我的自定义聚合函数,但它也不起作用。
public class LastNonNull<T> : IAggregate<T, T>
where T : class
{
T last;
public override void Init()
{
last = null;
}
public override void Accumulate(T val)
{
if (val != null)
{
last = val;
}
}
public override T Terminate()
{
return last;
}
}
尝试使用:
AGG<DataLakeTest.LastNonNull<string>>([b]) OVER (ORDER BY Timestamp ASC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS bf
Error E_CSC_USER_UNEXPECTEDOVERCLAUSE: Unexpected OVER clause.
Description: An OVER clause must follow a ranking function call (such
as RANK() or ROW_NUMBER()) or a WITHIN GROUP clause. Resolution: Make
sure the OVER clause immediately follows a ranking function call or a
WITHIN GROUP clause.
我可以使用什么样的用户定义对象?
更新
脚本:
@tb1 = SELECT * FROM
( VALUES
(1, "Val1"),
(2, (string)null),
(3, "Val3"),
(5, (string) null),
(6, (string)null),
(7, "Val7"),
(8, "Val8")
) AS T(Timestamp, a);
@tb1 =
SELECT Timestamp,
??? AS a
FROM @tb1;
OUTPUT @tb1 TO "/test.csv" USING Outputters.Csv(outputHeader: true);
预期输出:
"Timestamp","a"
1,"Val1"
2,"Val1"
3,"Val3"
5,"Val3"
6,"Val3"
7,"Val7"
8,"Val8"
更新 2:
遗憾的是,我无法使用 LAG 函数,因为非空值之间的空值计数未知。并且不能使用 CROSS JOIN,因为当我有非常大的表时,处理步骤就会冻结。我目前的解决方案(我不乐意使用它):
@tb1 =
SELECT Timestamp,
[a],
[a] != null && [a] != LEAD([a], 1) OVER(ORDER BY Timestamp ASC) AS aSwitch
FROM @tb1;
@tb1 =
SELECT Timestamp,
[a],
SUM(aSwitch ? 1 : 0) OVER(ORDER BY Timestamp ASC ROWS UNBOUNDED PRECEDING) AS aGrp
FROM @tb1;
@tb1 =
SELECT Timestamp,
FIRST_VALUE([a]) OVER(PARTITION BY aGrp ORDER BY Timestamp ASC) AS a
FROM @tb1;
最终解:
public class ReplaceNullReducer : IReducer
{
string lastValue = null;
public override IEnumerable<IRow> Reduce(IRowset input, IUpdatableRow output)
{
foreach (var row in input.Rows)
{
var val = row.Get<string>("a");
if (val != null) lastValue = val;
output.Set<string>("a", lastValue);
output.Set<int>("Timestamp", row.Get<int>("Timestamp"));
yield return output.AsReadOnly();
}
}
}
USQL(由于某种原因"ALL"选项触发E_CSC_USER_SYNTAXERROR错误,所以我引入了dumb device专栏):
@tb1 = SELECT * FROM
( VALUES
(1, "Val1", 1),
(2, (string)null, 1),
(3, "Val3", 1),
(5, (string) null, 1),
(6, (string)null, 1),
(7, "Val7", 1),
(8, "Val8", 1)
) AS T(Timestamp, a, device);
@tb1 = REDUCE @tb1 PRESORT [Timestamp] ON device
PRODUCE [Timestamp] int, [a] string
USING new DataLakeTest.ReplaceNullReducer();
假设您有不止一行可能为 NULL,您的解决方案似乎可行,或者您可以编写一个对预排序列表和 returns 值进行操作的自定义化简器。
例如,
@raw = SELECT * FROM
( VALUES
(1, "Val1"),
(2, (string) null),
(3, "Val3"),
(5, (string) null),
(6, (string) null),
(7, "Val7"),
(8, "Val8")
) AS T(Timestamp, a);
@res = REDUCE @raw PRESORT Timestamp ALL
PRODUCE Timestamp int, a string
USING new ReduceSample.ReplaceNullReducer();
然后将 ReplaceNullReducer 实现为递归缩减器,它逐步遍历行以获取 a 值(如果它不为 null)(并设置 null 替换值)直到它找到一个 null 值,然后将 null 替换为重置价值。您必须确保涵盖边缘情况,例如第一个值是否为空值。
以下博客 post 有更多关于减速器的详细信息:https://blogs.msdn.microsoft.com/azuredatalake/2016/06/27/how-do-i-combine-overlapping-ranges-using-u-sql-introducing-u-sql-reducer-udos/
我正在尝试在窗口表达式中凝胶化最后一个非空值:
LAST_VALUE([b]) OVER (ORDER BY Timestamp ASC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS bf
不幸的是,它不起作用。 我写了我的自定义聚合函数,但它也不起作用。
public class LastNonNull<T> : IAggregate<T, T>
where T : class
{
T last;
public override void Init()
{
last = null;
}
public override void Accumulate(T val)
{
if (val != null)
{
last = val;
}
}
public override T Terminate()
{
return last;
}
}
尝试使用:
AGG<DataLakeTest.LastNonNull<string>>([b]) OVER (ORDER BY Timestamp ASC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS bf
Error E_CSC_USER_UNEXPECTEDOVERCLAUSE: Unexpected OVER clause. Description: An OVER clause must follow a ranking function call (such as RANK() or ROW_NUMBER()) or a WITHIN GROUP clause. Resolution: Make sure the OVER clause immediately follows a ranking function call or a WITHIN GROUP clause.
我可以使用什么样的用户定义对象?
更新
脚本:
@tb1 = SELECT * FROM
( VALUES
(1, "Val1"),
(2, (string)null),
(3, "Val3"),
(5, (string) null),
(6, (string)null),
(7, "Val7"),
(8, "Val8")
) AS T(Timestamp, a);
@tb1 =
SELECT Timestamp,
??? AS a
FROM @tb1;
OUTPUT @tb1 TO "/test.csv" USING Outputters.Csv(outputHeader: true);
预期输出:
"Timestamp","a"
1,"Val1"
2,"Val1"
3,"Val3"
5,"Val3"
6,"Val3"
7,"Val7"
8,"Val8"
更新 2:
遗憾的是,我无法使用 LAG 函数,因为非空值之间的空值计数未知。并且不能使用 CROSS JOIN,因为当我有非常大的表时,处理步骤就会冻结。我目前的解决方案(我不乐意使用它):
@tb1 =
SELECT Timestamp,
[a],
[a] != null && [a] != LEAD([a], 1) OVER(ORDER BY Timestamp ASC) AS aSwitch
FROM @tb1;
@tb1 =
SELECT Timestamp,
[a],
SUM(aSwitch ? 1 : 0) OVER(ORDER BY Timestamp ASC ROWS UNBOUNDED PRECEDING) AS aGrp
FROM @tb1;
@tb1 =
SELECT Timestamp,
FIRST_VALUE([a]) OVER(PARTITION BY aGrp ORDER BY Timestamp ASC) AS a
FROM @tb1;
最终解:
public class ReplaceNullReducer : IReducer
{
string lastValue = null;
public override IEnumerable<IRow> Reduce(IRowset input, IUpdatableRow output)
{
foreach (var row in input.Rows)
{
var val = row.Get<string>("a");
if (val != null) lastValue = val;
output.Set<string>("a", lastValue);
output.Set<int>("Timestamp", row.Get<int>("Timestamp"));
yield return output.AsReadOnly();
}
}
}
USQL(由于某种原因"ALL"选项触发E_CSC_USER_SYNTAXERROR错误,所以我引入了dumb device专栏):
@tb1 = SELECT * FROM
( VALUES
(1, "Val1", 1),
(2, (string)null, 1),
(3, "Val3", 1),
(5, (string) null, 1),
(6, (string)null, 1),
(7, "Val7", 1),
(8, "Val8", 1)
) AS T(Timestamp, a, device);
@tb1 = REDUCE @tb1 PRESORT [Timestamp] ON device
PRODUCE [Timestamp] int, [a] string
USING new DataLakeTest.ReplaceNullReducer();
假设您有不止一行可能为 NULL,您的解决方案似乎可行,或者您可以编写一个对预排序列表和 returns 值进行操作的自定义化简器。
例如,
@raw = SELECT * FROM
( VALUES
(1, "Val1"),
(2, (string) null),
(3, "Val3"),
(5, (string) null),
(6, (string) null),
(7, "Val7"),
(8, "Val8")
) AS T(Timestamp, a);
@res = REDUCE @raw PRESORT Timestamp ALL
PRODUCE Timestamp int, a string
USING new ReduceSample.ReplaceNullReducer();
然后将 ReplaceNullReducer 实现为递归缩减器,它逐步遍历行以获取 a 值(如果它不为 null)(并设置 null 替换值)直到它找到一个 null 值,然后将 null 替换为重置价值。您必须确保涵盖边缘情况,例如第一个值是否为空值。
以下博客 post 有更多关于减速器的详细信息:https://blogs.msdn.microsoft.com/azuredatalake/2016/06/27/how-do-i-combine-overlapping-ranges-using-u-sql-introducing-u-sql-reducer-udos/