U-Sql,没有得到笛卡尔积怎么加入
U-Sql, how do i join without getting cartesian product
我有一个大文件,每个 ID 每天都有行。
每个ID每天可以有不止一条记录,但只有最新的值有效。
DailyValues:
ID int,
date datetime,
version datetime,
value1 float,
value2 float,
value3 float,
value4 float,
I T-SQL 我会 select MAX(version) 并按 ID、日期分组,然后通过交叉应用加入值。
select
B.*
from
(
Select
ID,
Date,
MAX(Version)
From DailyValues D1
group by
ID,
Date
) as A
CROSS APPLY (
select top 1 *
from DailyValues D2
where D1.ID = D2.ID
and D1.Date = D2.Date
and D1.Version = D2.version
order Version desc
) as B
文件太大,我无法在 T-sql 中完成。
我如何在 U-sql
中执行此操作
您可以先将 CSV 文件提取到行集中。解压后,可以select最新版本,如下:
@DailyValues =
EXTRACT ID int,
date datetime,
version datetime,
value1 float,
value2 float,
value3 float,
value4 float
FROM "/Samples/Data/DailyValues.csv"
USING Extractors.Csv(encoding: Encoding.[ASCII]);
SELECT ID, Date, Version
FROM
(
SELECT ID, Date,Version, ROW_Number() OVER(PARTITION BY ID, Date ORDER BY version DESC) AS rn
FROM @DailyValues) AS t
WHERE t.rn == 1;
我有一个大文件,每个 ID 每天都有行。 每个ID每天可以有不止一条记录,但只有最新的值有效。
DailyValues:
ID int,
date datetime,
version datetime,
value1 float,
value2 float,
value3 float,
value4 float,
I T-SQL 我会 select MAX(version) 并按 ID、日期分组,然后通过交叉应用加入值。
select
B.*
from
(
Select
ID,
Date,
MAX(Version)
From DailyValues D1
group by
ID,
Date
) as A
CROSS APPLY (
select top 1 *
from DailyValues D2
where D1.ID = D2.ID
and D1.Date = D2.Date
and D1.Version = D2.version
order Version desc
) as B
文件太大,我无法在 T-sql 中完成。
我如何在 U-sql
中执行此操作您可以先将 CSV 文件提取到行集中。解压后,可以select最新版本,如下:
@DailyValues =
EXTRACT ID int,
date datetime,
version datetime,
value1 float,
value2 float,
value3 float,
value4 float
FROM "/Samples/Data/DailyValues.csv"
USING Extractors.Csv(encoding: Encoding.[ASCII]);
SELECT ID, Date, Version
FROM
(
SELECT ID, Date,Version, ROW_Number() OVER(PARTITION BY ID, Date ORDER BY version DESC) AS rn
FROM @DailyValues) AS t
WHERE t.rn == 1;