计算列在组中的变化并根据条件提取

Calculating change in column over groups and extracting based on criteria

我是 U-SQL/C# 编码的初学者。我在 windowing/aggregation.

期间卡在了一个地方

我的数据看起来像

Name          Date                  OrderNo    Type   Balance
one 2018-06-25T04:55:44.0020987Z      1        Drink     15       
one 2018-06-25T04:57:44.0020987Z      1        Drink     70      
one 2018-06-25T04:59:44.0020987Z      1        Drink     33       
one 2018-06-25T04:59:49.0020987Z      1        Drink     25       
two 2018-06-25T04:55:44.0020987Z      2        Drink     22       
two 2018-06-25T04:57:44.0020987Z      2        Drink     81       
two 2018-06-25T04:58:44.0020987Z      2        Drink     33       
two 2018-06-25T04:59:44.0020987Z      2        Drink     45       

在 U-SQL 中,我添加了一个基于名称、订单号和类型组合的唯一 ID,为了排序,我添加了另一个 ID,包括日期。

@files = 
EXTRACT
        name string,
        date DateTime,
        type string,
        orderno int,
        balance int           
FROM
@InputFile
USING new JsonExtractor();

@files2 =
    SELECT *,
       DENSE_RANK() OVER(ORDER BY name,type,orderno,date) AS group_id,
       DENSE_RANK() OVER(ORDER BY name,type,orderno) AS id
     FROM @files;

我的数据现在看起来像这样:

Name          Date                  OrderNo    Type   Balance group_id id
one 2018-06-25T04:55:44.0020987Z      1        Drink     15       1     1
one 2018-06-25T04:57:44.0020987Z      1        Drink     70       2     1
one 2018-06-25T04:59:44.0020987Z      1        Drink     33       3     1
one 2018-06-25T04:59:49.0020987Z      1        Drink     25       4     1
two 2018-06-25T04:55:44.0020987Z      2        Drink     22       5     2
two 2018-06-25T04:57:44.0020987Z      2        Drink     81       6     2
two 2018-06-25T04:58:44.0020987Z      2        Drink     33       7     2
two 2018-06-25T04:59:44.0020987Z      2        Drink     45       8     2

(我每组只添加了 4 条记录,但每组有多个记录)

我无法确定每个组[=44]中平衡列中连续行之间的差异 =].

第 1 部分的预期输出:

Name          Date                  OrderNo    Type   Balance group_id id  increase
one 2018-06-25T04:55:44.0020987Z      1        Drink     15       1     1    0
one 2018-06-25T04:57:44.0020987Z      1        Drink     70       2     1    55
one 2018-06-25T04:59:44.0020987Z      1        Drink     33       3     1   -37
one 2018-06-25T04:59:49.0020987Z      1        Drink     25       4     1   -8
two 2018-06-25T04:55:44.0020987Z      2        Drink     22       5     2    0
two 2018-06-25T04:57:44.0020987Z      2        Drink     81       6     2    59
two 2018-06-25T04:58:44.0020987Z      2        Drink     33       7     2   -48
two 2018-06-25T04:59:44.0020987Z      2        Drink     45       8     2    8

对于每个新组(由 id 定义),增加应从零开始。

我经历了堆栈溢出并从 transgresql 中看到了滞后函数。我找不到 C# 等效项。这适用于这种情况吗?

感谢任何帮助。如果需要,将提供进一步的说明。

更新:当我使用 CASE WHEN 时,我的解决方案如下所示

CURRENT OUTPUT                            DESIRED OUTPUT
id Balance Increase                     id  Balance Increase
 1  15      0                            1  15      0
 1  70     55                            1  70     55
 1  33    -37                            1  33    -37
 1  25     -8                            1  25     -8
 2  22    "-3"                           2  22     "0"
 2  81     59                            2  81     59
 2  33    -48                            2  33    -48
 2  45     12                            2  45     12

查看突出显示的行。每个 id 的增加列必须从 0 开始。

更新:我能够解决问题的第一部分。请参阅下面的答案。 我之前发布的第二部分发布不正确。我已经删除了它。

你可以尝试在子查询中使用LAG window函数获取前一个Balance,然后使用where写条件。

SELECT * FROM (
    SELECT *,
       DENSE_RANK() OVER(ORDER BY name,type,orderno,date) AS group_id,
       DENSE_RANK() OVER(ORDER BY name,type,orderno) AS id,
       (CASE WHEN LAG(Balance) OVER(ORDER BY name,type,orderno) IS NULL THEN 0 
             ELSE  Balance  - LAG(Balance) OVER(ORDER BY name,type,orderno) 
        END) as increase 
    FROM @files
) t1
WHERE increase > 50

最终对我有用的查询是这个..

@files = 
EXTRACT
        name string,
        date DateTime,
        type string,
        orderno int,
        balance int           
FROM
@InputFile
USING new JsonExtractor();

@files2 =
SELECT *,
       DENSE_RANK() OVER(ORDER BY name,type,orderno) AS group_id
FROM @files;

@files3 =
SELECT *,
       DENSE_RANK() OVER(PARTITION BY group_id ORDER BY date) AS group_order
FROM @files2;


@files4 =
SELECT *,
     (CASE WHEN group_order == 1 THEN 0 
         ELSE  balance  - LAG(balance) OVER(ORDER BY name,type,orderno) 
    END) AS increase 
FROM @files3;