需要在 u-sql 查询中获取 n 百分比的行

Need to fetch n percentage of rows in u-sql query

需要帮助编写 u-sql 查询以获取前 n 个百分比 rows.I 有一个数据集需要从中获取总行数并从基于数据集的数据集中获取前 3% 行在 col1 上我写的代码是:

 @count = SELECT Convert.ToInt32(COUNT(*)) AS cnt FROM @telData;
@count1=SELECT cnt/100 AS cnt1 FROM @count;
DECLARE @cnt int=SELECT Convert.ToInt32(cnt1*3)  FROM @count1;


        @EngineFailureData=
            SELECT vin,accelerator_pedal_position,enginefailure=1
            FROM @telData
            ORDER BY accelerator_pedal_position DESC
            FETCH @cnt ROWS;

@telData 是我的基本 dataset.Thanks 寻求帮助。

先评论一下:

  1. FETCH 目前只接受文字作为参数 (https://msdn.microsoft.com/en-us/library/azure/mt621321.aspx)
  2. @var = SELECT ... 会将名称 @var 分配给以 SELECT 开头的行集表达式。 U-SQL(当前)不为您提供来自查询结果的有状态标量变量赋值。相反,您将使用 CROSS JOIN 或其他 JOIN 来加入标量值。

现在解决方案:

要获得百分比,请查看 ROW_NUMBER()PERCENT_RANK() 函数。例如,下面向您展示了如何使用其中之一来回答您的问题。鉴于 PERCENT_RANK() 的代码更简单(不需要 MAX()CROSS JOIN),我会建议该解决方案。

DECLARE @percentage double = 0.25; // 25%

@data = SELECT * 
        FROM (VALUES(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12),(13),(14),(15),(16),(17),(18),(19),(20) 
             ) AS T(pos);

@data =
SELECT PERCENT_RANK() OVER(ORDER BY pos) AS p_rank,
       ROW_NUMBER() OVER(ORDER BY pos) AS r_no,
       pos
FROM @data;

@cut_off =
SELECT ((double) MAX(r_no)) * (1.0 - @percentage) AS max_r
FROM @data;

@r1 =
SELECT *
FROM @data CROSS JOIN @cut_off
WHERE ((double) r_no) > max_r;

@r2 =
SELECT *
FROM @data
WHERE p_rank >= 1.0 - @percentage;

OUTPUT @r1
TO "/output/top_perc1.csv"
ORDER BY p_rank DESC
USING Outputters.Csv();

OUTPUT @r2
TO "/output/top_perc2.csv"
ORDER BY p_rank DESC
USING Outputters.Csv();