在 Azure 数据工厂的管道中 运行 U-SQL Activity 时出错
Error while running U-SQL Activity in Pipeline in Azure Data Factory
在 ADF 的管道中 运行 USQL Activity 时出现以下错误:
Activity 中的错误:
{"errorId":"E_CSC_USER_SYNTAXERROR","severity":"Error","component":"CSC",
"source":"USER","message":"syntax error.
Final statement did not end with a semicolon","details":"at token 'txt', line 3\r\nnear the ###:\r\n**************\r\nDECLARE @in string = \"/demo/SearchLog.txt\";\nDECLARE @out string = \"/scripts/Result.txt\";\nSearchLogProcessing.txt ### \n",
"description":"Invalid syntax found in the script.",
"resolution":"Correct the script syntax, using expected token(s) as a guide.","helpLink":"","filePath":"","lineNumber":3,
"startOffset":109,"endOffset":112}].
这是我试图在管道中执行的输出数据集、管道和 USQL 脚本的代码。
输出数据集:
{
"name": "OutputDataLakeTable",
"properties": {
"published": false,
"type": "AzureDataLakeStore",
"linkedServiceName": "LinkedServiceDestination",
"typeProperties": {
"folderPath": "scripts/"
},
"availability": {
"frequency": "Hour",
"interval": 1
}
}
管道:
{
"name": "ComputeEventsByRegionPipeline",
"properties": {
"description": "This is a pipeline to compute events for en-gb locale and date less than 2012/02/19.",
"activities": [
{
"type": "DataLakeAnalyticsU-SQL",
"typeProperties": {
"script": "SearchLogProcessing.txt",
"scriptPath": "scripts\",
"degreeOfParallelism": 3,
"priority": 100,
"parameters": {
"in": "/demo/SearchLog.txt",
"out": "/scripts/Result.txt"
}
},
"inputs": [
{
"name": "InputDataLakeTable"
}
],
"outputs": [
{
"name": "OutputDataLakeTable"
}
],
"policy": {
"timeout": "06:00:00",
"concurrency": 1,
"executionPriorityOrder": "NewestFirst",
"retry": 1
},
"scheduler": {
"frequency": "Minute",
"interval": 15
},
"name": "CopybyU-SQL",
"linkedServiceName": "AzureDataLakeAnalyticsLinkedService"
}
],
"start": "2017-01-03T12:01:05.53Z",
"end": "2017-01-03T13:01:05.53Z",
"isPaused": false,
"hubName": "denojaidbfactory_hub",
"pipelineMode": "Scheduled"
}
}
这是我尝试使用 "DataLakeAnalyticsU-SQL" Activity 类型执行的 USQL 脚本。
@searchlog =
EXTRACT UserId int,
Start DateTime,
Region string,
Query string,
Duration int?,
Urls string,
ClickedUrls string
FROM @in
USING Extractors.Text(delimiter:'|');
@rs1 =
SELECT Start, Region, Duration
FROM @searchlog
WHERE Region == "kota";
OUTPUT @rs1
TO @out
USING Outputters.Text(delimiter:'|');
请建议我如何解决此问题。
删除 U-SQL activity 定义中的 script
属性,并在 scriptPath
属性中提供脚本的完整路径(包括文件名)。
参考:https://docs.microsoft.com/en-us/azure/data-factory/data-factory-usql-activity
您的脚本缺少 scriptLinkedService
属性。您还(当前)需要将 U-SQL 脚本放置在 Azure Blob 存储中才能成功 运行。因此,您还需要一个 AzureStorage
链接服务,例如:
{
"name": "StorageLinkedService",
"properties": {
"description": "",
"type": "AzureStorage",
"typeProperties": {
"connectionString": "DefaultEndpointsProtocol=https;AccountName=myAzureBlobStorageAccount;AccountKey=**********"
}
}
}
创建此 linked 服务,将 Blob 存储名称 myAzureBlobStorageAccount
替换为您的相关 Blob 存储帐户,然后放置 U-SQL 脚本 (SearchLogProcessing.txt)在那里的容器中,然后重试。在我下面的示例管道中,我的 Blob 存储中有一个名为 adlascripts
的容器,脚本就在那里:
确保 scriptPath
已完成,如 Alexandre 所述。管道开始:
{
"name": "ComputeEventsByRegionPipeline",
"properties": {
"description": "This is a pipeline to compute events for en-gb locale and date less than 2012/02/19.",
"activities": [
{
"type": "DataLakeAnalyticsU-SQL",
"typeProperties": {
"scriptPath": "adlascripts\SearchLogProcessing.txt",
"scriptLinkedService": "StorageLinkedService",
"degreeOfParallelism": 3,
"priority": 100,
"parameters": {
"in": "/input/SearchLog.tsv",
"out": "/output/Result.tsv"
}
},
...
input
和 output
.tsv 文件可以在数据湖中并使用 AzureDataLakeStoreLinkedService
linked 服务。
我可以看到您正在尝试关注以下演示:https://docs.microsoft.com/en-us/azure/data-factory/data-factory-usql-activity#script-definition。它不是最直观的演示,似乎存在一些问题,例如 StorageLinkedService
的定义在哪里?,SearchLogProcessing.txt
在哪里?好的,我通过谷歌搜索找到了它,但网页中应该有一个 link。我成功了,但感觉有点像《混血王子》中的哈利波特。
我有一个类似的问题,Azure 数据工厂无法识别我的脚本文件。避免整个问题的一种方法,同时不必粘贴大量代码,是注册一个存储过程。你可以这样做:
DROP PROCEDURE IF EXISTS master.dbo.sp_test;
CREATE PROCEDURE master.dbo.sp_test()
AS
BEGIN
@searchlog =
EXTRACT UserId int,
Start DateTime,
Region string,
Query string,
Duration int?,
Urls string,
ClickedUrls string
FROM @in
USING Extractors.Text(delimiter:'|');
@rs1 =
SELECT Start, Region, Duration
FROM @searchlog
WHERE Region == "kota";
OUTPUT @rs1
TO @out
USING Outputters.Text(delimiter:'|');
END;
运行宁此后,您可以使用
"script": "master.dbo.sp_test()"
在您的 JSON 管道定义中。每当您更新 U-SQL 脚本时,只需重新定义 运行 程序。然后就不需要将脚本文件复制到 Blob 存储了。
在 ADF 的管道中 运行 USQL Activity 时出现以下错误:
Activity 中的错误:
{"errorId":"E_CSC_USER_SYNTAXERROR","severity":"Error","component":"CSC",
"source":"USER","message":"syntax error.
Final statement did not end with a semicolon","details":"at token 'txt', line 3\r\nnear the ###:\r\n**************\r\nDECLARE @in string = \"/demo/SearchLog.txt\";\nDECLARE @out string = \"/scripts/Result.txt\";\nSearchLogProcessing.txt ### \n",
"description":"Invalid syntax found in the script.",
"resolution":"Correct the script syntax, using expected token(s) as a guide.","helpLink":"","filePath":"","lineNumber":3,
"startOffset":109,"endOffset":112}].
这是我试图在管道中执行的输出数据集、管道和 USQL 脚本的代码。
输出数据集:
{
"name": "OutputDataLakeTable",
"properties": {
"published": false,
"type": "AzureDataLakeStore",
"linkedServiceName": "LinkedServiceDestination",
"typeProperties": {
"folderPath": "scripts/"
},
"availability": {
"frequency": "Hour",
"interval": 1
}
}
管道:
{
"name": "ComputeEventsByRegionPipeline",
"properties": {
"description": "This is a pipeline to compute events for en-gb locale and date less than 2012/02/19.",
"activities": [
{
"type": "DataLakeAnalyticsU-SQL",
"typeProperties": {
"script": "SearchLogProcessing.txt",
"scriptPath": "scripts\",
"degreeOfParallelism": 3,
"priority": 100,
"parameters": {
"in": "/demo/SearchLog.txt",
"out": "/scripts/Result.txt"
}
},
"inputs": [
{
"name": "InputDataLakeTable"
}
],
"outputs": [
{
"name": "OutputDataLakeTable"
}
],
"policy": {
"timeout": "06:00:00",
"concurrency": 1,
"executionPriorityOrder": "NewestFirst",
"retry": 1
},
"scheduler": {
"frequency": "Minute",
"interval": 15
},
"name": "CopybyU-SQL",
"linkedServiceName": "AzureDataLakeAnalyticsLinkedService"
}
],
"start": "2017-01-03T12:01:05.53Z",
"end": "2017-01-03T13:01:05.53Z",
"isPaused": false,
"hubName": "denojaidbfactory_hub",
"pipelineMode": "Scheduled"
}
}
这是我尝试使用 "DataLakeAnalyticsU-SQL" Activity 类型执行的 USQL 脚本。
@searchlog =
EXTRACT UserId int,
Start DateTime,
Region string,
Query string,
Duration int?,
Urls string,
ClickedUrls string
FROM @in
USING Extractors.Text(delimiter:'|');
@rs1 =
SELECT Start, Region, Duration
FROM @searchlog
WHERE Region == "kota";
OUTPUT @rs1
TO @out
USING Outputters.Text(delimiter:'|');
请建议我如何解决此问题。
删除 U-SQL activity 定义中的 script
属性,并在 scriptPath
属性中提供脚本的完整路径(包括文件名)。
参考:https://docs.microsoft.com/en-us/azure/data-factory/data-factory-usql-activity
您的脚本缺少 scriptLinkedService
属性。您还(当前)需要将 U-SQL 脚本放置在 Azure Blob 存储中才能成功 运行。因此,您还需要一个 AzureStorage
链接服务,例如:
{
"name": "StorageLinkedService",
"properties": {
"description": "",
"type": "AzureStorage",
"typeProperties": {
"connectionString": "DefaultEndpointsProtocol=https;AccountName=myAzureBlobStorageAccount;AccountKey=**********"
}
}
}
创建此 linked 服务,将 Blob 存储名称 myAzureBlobStorageAccount
替换为您的相关 Blob 存储帐户,然后放置 U-SQL 脚本 (SearchLogProcessing.txt)在那里的容器中,然后重试。在我下面的示例管道中,我的 Blob 存储中有一个名为 adlascripts
的容器,脚本就在那里:
确保 scriptPath
已完成,如 Alexandre 所述。管道开始:
{
"name": "ComputeEventsByRegionPipeline",
"properties": {
"description": "This is a pipeline to compute events for en-gb locale and date less than 2012/02/19.",
"activities": [
{
"type": "DataLakeAnalyticsU-SQL",
"typeProperties": {
"scriptPath": "adlascripts\SearchLogProcessing.txt",
"scriptLinkedService": "StorageLinkedService",
"degreeOfParallelism": 3,
"priority": 100,
"parameters": {
"in": "/input/SearchLog.tsv",
"out": "/output/Result.tsv"
}
},
...
input
和 output
.tsv 文件可以在数据湖中并使用 AzureDataLakeStoreLinkedService
linked 服务。
我可以看到您正在尝试关注以下演示:https://docs.microsoft.com/en-us/azure/data-factory/data-factory-usql-activity#script-definition。它不是最直观的演示,似乎存在一些问题,例如 StorageLinkedService
的定义在哪里?,SearchLogProcessing.txt
在哪里?好的,我通过谷歌搜索找到了它,但网页中应该有一个 link。我成功了,但感觉有点像《混血王子》中的哈利波特。
我有一个类似的问题,Azure 数据工厂无法识别我的脚本文件。避免整个问题的一种方法,同时不必粘贴大量代码,是注册一个存储过程。你可以这样做:
DROP PROCEDURE IF EXISTS master.dbo.sp_test;
CREATE PROCEDURE master.dbo.sp_test()
AS
BEGIN
@searchlog =
EXTRACT UserId int,
Start DateTime,
Region string,
Query string,
Duration int?,
Urls string,
ClickedUrls string
FROM @in
USING Extractors.Text(delimiter:'|');
@rs1 =
SELECT Start, Region, Duration
FROM @searchlog
WHERE Region == "kota";
OUTPUT @rs1
TO @out
USING Outputters.Text(delimiter:'|');
END;
运行宁此后,您可以使用
"script": "master.dbo.sp_test()"
在您的 JSON 管道定义中。每当您更新 U-SQL 脚本时,只需重新定义 运行 程序。然后就不需要将脚本文件复制到 Blob 存储了。