计算列在组中的变化并根据条件提取
Calculating change in column over groups and extracting based on criteria
我是 U-SQL/C# 编码的初学者。我在 windowing/aggregation.
期间卡在了一个地方
我的数据看起来像
Name Date OrderNo Type Balance
one 2018-06-25T04:55:44.0020987Z 1 Drink 15
one 2018-06-25T04:57:44.0020987Z 1 Drink 70
one 2018-06-25T04:59:44.0020987Z 1 Drink 33
one 2018-06-25T04:59:49.0020987Z 1 Drink 25
two 2018-06-25T04:55:44.0020987Z 2 Drink 22
two 2018-06-25T04:57:44.0020987Z 2 Drink 81
two 2018-06-25T04:58:44.0020987Z 2 Drink 33
two 2018-06-25T04:59:44.0020987Z 2 Drink 45
在 U-SQL 中,我添加了一个基于名称、订单号和类型组合的唯一 ID,为了排序,我添加了另一个 ID,包括日期。
@files =
EXTRACT
name string,
date DateTime,
type string,
orderno int,
balance int
FROM
@InputFile
USING new JsonExtractor();
@files2 =
SELECT *,
DENSE_RANK() OVER(ORDER BY name,type,orderno,date) AS group_id,
DENSE_RANK() OVER(ORDER BY name,type,orderno) AS id
FROM @files;
我的数据现在看起来像这样:
Name Date OrderNo Type Balance group_id id
one 2018-06-25T04:55:44.0020987Z 1 Drink 15 1 1
one 2018-06-25T04:57:44.0020987Z 1 Drink 70 2 1
one 2018-06-25T04:59:44.0020987Z 1 Drink 33 3 1
one 2018-06-25T04:59:49.0020987Z 1 Drink 25 4 1
two 2018-06-25T04:55:44.0020987Z 2 Drink 22 5 2
two 2018-06-25T04:57:44.0020987Z 2 Drink 81 6 2
two 2018-06-25T04:58:44.0020987Z 2 Drink 33 7 2
two 2018-06-25T04:59:44.0020987Z 2 Drink 45 8 2
(我每组只添加了 4 条记录,但每组有多个记录)
我无法确定每个组[=44]中平衡列中连续行之间的差异 =].
第 1 部分的预期输出:
Name Date OrderNo Type Balance group_id id increase
one 2018-06-25T04:55:44.0020987Z 1 Drink 15 1 1 0
one 2018-06-25T04:57:44.0020987Z 1 Drink 70 2 1 55
one 2018-06-25T04:59:44.0020987Z 1 Drink 33 3 1 -37
one 2018-06-25T04:59:49.0020987Z 1 Drink 25 4 1 -8
two 2018-06-25T04:55:44.0020987Z 2 Drink 22 5 2 0
two 2018-06-25T04:57:44.0020987Z 2 Drink 81 6 2 59
two 2018-06-25T04:58:44.0020987Z 2 Drink 33 7 2 -48
two 2018-06-25T04:59:44.0020987Z 2 Drink 45 8 2 8
对于每个新组(由 id 定义),增加应从零开始。
我经历了堆栈溢出并从 transgresql 中看到了滞后函数。我找不到 C# 等效项。这适用于这种情况吗?
感谢任何帮助。如果需要,将提供进一步的说明。
更新:当我使用 CASE WHEN 时,我的解决方案如下所示
CURRENT OUTPUT DESIRED OUTPUT
id Balance Increase id Balance Increase
1 15 0 1 15 0
1 70 55 1 70 55
1 33 -37 1 33 -37
1 25 -8 1 25 -8
2 22 "-3" 2 22 "0"
2 81 59 2 81 59
2 33 -48 2 33 -48
2 45 12 2 45 12
查看突出显示的行。每个 id 的增加列必须从 0 开始。
更新:我能够解决问题的第一部分。请参阅下面的答案。
我之前发布的第二部分发布不正确。我已经删除了它。
你可以尝试在子查询中使用LAG window函数获取前一个Balance
,然后使用where
写条件。
SELECT * FROM (
SELECT *,
DENSE_RANK() OVER(ORDER BY name,type,orderno,date) AS group_id,
DENSE_RANK() OVER(ORDER BY name,type,orderno) AS id,
(CASE WHEN LAG(Balance) OVER(ORDER BY name,type,orderno) IS NULL THEN 0
ELSE Balance - LAG(Balance) OVER(ORDER BY name,type,orderno)
END) as increase
FROM @files
) t1
WHERE increase > 50
最终对我有用的查询是这个..
@files =
EXTRACT
name string,
date DateTime,
type string,
orderno int,
balance int
FROM
@InputFile
USING new JsonExtractor();
@files2 =
SELECT *,
DENSE_RANK() OVER(ORDER BY name,type,orderno) AS group_id
FROM @files;
@files3 =
SELECT *,
DENSE_RANK() OVER(PARTITION BY group_id ORDER BY date) AS group_order
FROM @files2;
@files4 =
SELECT *,
(CASE WHEN group_order == 1 THEN 0
ELSE balance - LAG(balance) OVER(ORDER BY name,type,orderno)
END) AS increase
FROM @files3;
我是 U-SQL/C# 编码的初学者。我在 windowing/aggregation.
期间卡在了一个地方我的数据看起来像
Name Date OrderNo Type Balance
one 2018-06-25T04:55:44.0020987Z 1 Drink 15
one 2018-06-25T04:57:44.0020987Z 1 Drink 70
one 2018-06-25T04:59:44.0020987Z 1 Drink 33
one 2018-06-25T04:59:49.0020987Z 1 Drink 25
two 2018-06-25T04:55:44.0020987Z 2 Drink 22
two 2018-06-25T04:57:44.0020987Z 2 Drink 81
two 2018-06-25T04:58:44.0020987Z 2 Drink 33
two 2018-06-25T04:59:44.0020987Z 2 Drink 45
在 U-SQL 中,我添加了一个基于名称、订单号和类型组合的唯一 ID,为了排序,我添加了另一个 ID,包括日期。
@files =
EXTRACT
name string,
date DateTime,
type string,
orderno int,
balance int
FROM
@InputFile
USING new JsonExtractor();
@files2 =
SELECT *,
DENSE_RANK() OVER(ORDER BY name,type,orderno,date) AS group_id,
DENSE_RANK() OVER(ORDER BY name,type,orderno) AS id
FROM @files;
我的数据现在看起来像这样:
Name Date OrderNo Type Balance group_id id
one 2018-06-25T04:55:44.0020987Z 1 Drink 15 1 1
one 2018-06-25T04:57:44.0020987Z 1 Drink 70 2 1
one 2018-06-25T04:59:44.0020987Z 1 Drink 33 3 1
one 2018-06-25T04:59:49.0020987Z 1 Drink 25 4 1
two 2018-06-25T04:55:44.0020987Z 2 Drink 22 5 2
two 2018-06-25T04:57:44.0020987Z 2 Drink 81 6 2
two 2018-06-25T04:58:44.0020987Z 2 Drink 33 7 2
two 2018-06-25T04:59:44.0020987Z 2 Drink 45 8 2
(我每组只添加了 4 条记录,但每组有多个记录)
我无法确定每个组[=44]中平衡列中连续行之间的差异 =].
第 1 部分的预期输出:
Name Date OrderNo Type Balance group_id id increase
one 2018-06-25T04:55:44.0020987Z 1 Drink 15 1 1 0
one 2018-06-25T04:57:44.0020987Z 1 Drink 70 2 1 55
one 2018-06-25T04:59:44.0020987Z 1 Drink 33 3 1 -37
one 2018-06-25T04:59:49.0020987Z 1 Drink 25 4 1 -8
two 2018-06-25T04:55:44.0020987Z 2 Drink 22 5 2 0
two 2018-06-25T04:57:44.0020987Z 2 Drink 81 6 2 59
two 2018-06-25T04:58:44.0020987Z 2 Drink 33 7 2 -48
two 2018-06-25T04:59:44.0020987Z 2 Drink 45 8 2 8
对于每个新组(由 id 定义),增加应从零开始。
我经历了堆栈溢出并从 transgresql 中看到了滞后函数。我找不到 C# 等效项。这适用于这种情况吗?
感谢任何帮助。如果需要,将提供进一步的说明。
更新:当我使用 CASE WHEN 时,我的解决方案如下所示
CURRENT OUTPUT DESIRED OUTPUT
id Balance Increase id Balance Increase
1 15 0 1 15 0
1 70 55 1 70 55
1 33 -37 1 33 -37
1 25 -8 1 25 -8
2 22 "-3" 2 22 "0"
2 81 59 2 81 59
2 33 -48 2 33 -48
2 45 12 2 45 12
查看突出显示的行。每个 id 的增加列必须从 0 开始。
更新:我能够解决问题的第一部分。请参阅下面的答案。 我之前发布的第二部分发布不正确。我已经删除了它。
你可以尝试在子查询中使用LAG window函数获取前一个Balance
,然后使用where
写条件。
SELECT * FROM (
SELECT *,
DENSE_RANK() OVER(ORDER BY name,type,orderno,date) AS group_id,
DENSE_RANK() OVER(ORDER BY name,type,orderno) AS id,
(CASE WHEN LAG(Balance) OVER(ORDER BY name,type,orderno) IS NULL THEN 0
ELSE Balance - LAG(Balance) OVER(ORDER BY name,type,orderno)
END) as increase
FROM @files
) t1
WHERE increase > 50
最终对我有用的查询是这个..
@files =
EXTRACT
name string,
date DateTime,
type string,
orderno int,
balance int
FROM
@InputFile
USING new JsonExtractor();
@files2 =
SELECT *,
DENSE_RANK() OVER(ORDER BY name,type,orderno) AS group_id
FROM @files;
@files3 =
SELECT *,
DENSE_RANK() OVER(PARTITION BY group_id ORDER BY date) AS group_order
FROM @files2;
@files4 =
SELECT *,
(CASE WHEN group_order == 1 THEN 0
ELSE balance - LAG(balance) OVER(ORDER BY name,type,orderno)
END) AS increase
FROM @files3;