如何在 visual studio 代码中将本地环境设置为 运行 U-SQL 而无需连接到 Azure DataLake?
How do I set up a local environment in visual studio code to run U-SQL without being connected to the Azure DataLake?
这是我的代码,用于消除空单元格和重复的 Function 行,同时还使 Product 列与 Function 列正确对齐。我只想保留函数的第一次出现并删除任何 duplicates.It 编译就好了,但我找不到我的输出。有人建议我只需单击输出的 jobURL,但这对我来说效果不佳。这是一个示例文件,它是完整电子表格的一小部分,仅包含 2 个相关列中的数据。完整的电子表格在所有列中都有数据。 https://www.dropbox.com/s/auu2aco4b037xn7/Function.csv?dl=0
@input =
EXTRACT
CompanyID string,
division string,
store_location string,
International_Id string,
Function string,
office_location string,
address string,
Product string,
Revenue string,
sales_goal string,
Manager string,
Country string
FROM "/input/input142.csv"
USING Extractors.Csv(skipFirstNRows : 1 );
// Remove empty columns
@working =
SELECT *
FROM @input
WHERE Function.Length > 0;
// Rank the columns by Function and keep only the first one
@working =
SELECT CompanyID,
division,
store_location,
International_Id,
Function,
office_location,
address,
Product,
Revenue,
sales_goal,
Manager,
Country
FROM
(
SELECT *,
ROW_NUMBER() OVER(PARTITION BY Function ORDER BY Product)
AS rn
FROM @working
) AS x
WHERE rn == 1;
@output = SELECT * FROM @working;
OUTPUT @output TO "/output/output.csv"
USING Outputters.Csv(quoting:false);
这是我想要的结果:
https://www.dropbox.com/s/o82eskycbq1i1ss/Function_desired_result.xlsx?dl=0
如果您想 run/debug 在本地使用您的脚本,请查看此文档
这是我的代码,用于消除空单元格和重复的 Function 行,同时还使 Product 列与 Function 列正确对齐。我只想保留函数的第一次出现并删除任何 duplicates.It 编译就好了,但我找不到我的输出。有人建议我只需单击输出的 jobURL,但这对我来说效果不佳。这是一个示例文件,它是完整电子表格的一小部分,仅包含 2 个相关列中的数据。完整的电子表格在所有列中都有数据。 https://www.dropbox.com/s/auu2aco4b037xn7/Function.csv?dl=0
@input =
EXTRACT
CompanyID string,
division string,
store_location string,
International_Id string,
Function string,
office_location string,
address string,
Product string,
Revenue string,
sales_goal string,
Manager string,
Country string
FROM "/input/input142.csv"
USING Extractors.Csv(skipFirstNRows : 1 );
// Remove empty columns
@working =
SELECT *
FROM @input
WHERE Function.Length > 0;
// Rank the columns by Function and keep only the first one
@working =
SELECT CompanyID,
division,
store_location,
International_Id,
Function,
office_location,
address,
Product,
Revenue,
sales_goal,
Manager,
Country
FROM
(
SELECT *,
ROW_NUMBER() OVER(PARTITION BY Function ORDER BY Product)
AS rn
FROM @working
) AS x
WHERE rn == 1;
@output = SELECT * FROM @working;
OUTPUT @output TO "/output/output.csv"
USING Outputters.Csv(quoting:false);
这是我想要的结果: https://www.dropbox.com/s/o82eskycbq1i1ss/Function_desired_result.xlsx?dl=0
如果您想 run/debug 在本地使用您的脚本,请查看此文档