输入数据集不工作
Input Dataset not working
我创建了一个 Azure 数据工厂来使用 "DataLakeAnalyticsU-SQL" activity 安排 U-SQL 脚本。请参阅下面的代码:
InputDataset
{
"name": "InputDataLakeTable",
"properties": {
"published": false,
"type": "AzureDataLakeStore",
"linkedServiceName": "LinkedServiceSource",
"typeProperties": {
"fileName": "SearchLog.txt",
"folderPath": "demo/",
"format": {
"type": "TextFormat",
"rowDelimiter": "\n",
"columnDelimiter": "|",
"quoteChar": "\""
}
},
"availability": {
"frequency": "Hour",
"interval": 1
}
}
}
OutputDataset:
{
"name": "OutputDataLakeTable",
"properties": {
"published": false,
"type": "AzureDataLakeStore",
"linkedServiceName": "LinkedServiceDestination",
"typeProperties": {
"folderPath": "scripts/"
},
"availability": {
"frequency": "Hour",
"interval": 1
}
}
}
Pipeline:
{
"name": "ComputeEventsByRegionPipeline",
"properties": {
"description": "This is a pipeline to compute events for en-gb locale and date less than 2012/02/19.",
"activities": [
{
"type": "DataLakeAnalyticsU-SQL",
"typeProperties": {
"scriptPath": "scripts\SearchLogProcessing.txt",
"degreeOfParallelism": 3,
"priority": 100,
"parameters": {
"in": "/demo/SearchLog.txt",
"out": "/scripts/Result.txt"
}
},
"inputs": [
{
"name": "InputDataLakeTable"
}
],
"outputs": [
{
"name": "OutputDataLakeTable"
}
],
"policy": {
"timeout": "06:00:00",
"concurrency": 1,
"executionPriorityOrder": "NewestFirst",
"retry": 1
},
"scheduler": {
"frequency": "Hour",
"interval": 1
},
"name": "CopybyU-SQL",
"linkedServiceName": "AzureDataLakeAnalyticsLinkedService"
}
],
"start": "2016-12-21T17:44:13.557Z",
"end": "2016-12-22T17:44:13.557Z",
"isPaused": false,
"hubName": "denojaidbfactory_hub",
"pipelineMode": "Scheduled"
}
}
我已成功创建所有必需的链接服务。
但是在部署管道之后,没有为输入数据集创建时间片。见下图:
而输出数据集需要上游输入数据集时间片。因此,输出数据集的时间片仍处于等待执行状态,我的 Azure 数据工厂管道无法正常工作。
见下图:
任何解决此问题的建议。
如果您没有另一个 activity 创建您的 InputDataLakeTable,您需要添加属性
"external": true
https://docs.microsoft.com/en-us/azure/data-factory/data-factory-faq
https://docs.microsoft.com/en-us/azure/data-factory/data-factory-create-datasets
我创建了一个 Azure 数据工厂来使用 "DataLakeAnalyticsU-SQL" activity 安排 U-SQL 脚本。请参阅下面的代码:
InputDataset
{
"name": "InputDataLakeTable",
"properties": {
"published": false,
"type": "AzureDataLakeStore",
"linkedServiceName": "LinkedServiceSource",
"typeProperties": {
"fileName": "SearchLog.txt",
"folderPath": "demo/",
"format": {
"type": "TextFormat",
"rowDelimiter": "\n",
"columnDelimiter": "|",
"quoteChar": "\""
}
},
"availability": {
"frequency": "Hour",
"interval": 1
}
}
}
OutputDataset:
{
"name": "OutputDataLakeTable",
"properties": {
"published": false,
"type": "AzureDataLakeStore",
"linkedServiceName": "LinkedServiceDestination",
"typeProperties": {
"folderPath": "scripts/"
},
"availability": {
"frequency": "Hour",
"interval": 1
}
}
}
Pipeline:
{
"name": "ComputeEventsByRegionPipeline",
"properties": {
"description": "This is a pipeline to compute events for en-gb locale and date less than 2012/02/19.",
"activities": [
{
"type": "DataLakeAnalyticsU-SQL",
"typeProperties": {
"scriptPath": "scripts\SearchLogProcessing.txt",
"degreeOfParallelism": 3,
"priority": 100,
"parameters": {
"in": "/demo/SearchLog.txt",
"out": "/scripts/Result.txt"
}
},
"inputs": [
{
"name": "InputDataLakeTable"
}
],
"outputs": [
{
"name": "OutputDataLakeTable"
}
],
"policy": {
"timeout": "06:00:00",
"concurrency": 1,
"executionPriorityOrder": "NewestFirst",
"retry": 1
},
"scheduler": {
"frequency": "Hour",
"interval": 1
},
"name": "CopybyU-SQL",
"linkedServiceName": "AzureDataLakeAnalyticsLinkedService"
}
],
"start": "2016-12-21T17:44:13.557Z",
"end": "2016-12-22T17:44:13.557Z",
"isPaused": false,
"hubName": "denojaidbfactory_hub",
"pipelineMode": "Scheduled"
}
}
我已成功创建所有必需的链接服务。
但是在部署管道之后,没有为输入数据集创建时间片。见下图:
而输出数据集需要上游输入数据集时间片。因此,输出数据集的时间片仍处于等待执行状态,我的 Azure 数据工厂管道无法正常工作。
见下图:
如果您没有另一个 activity 创建您的 InputDataLakeTable,您需要添加属性
"external": true
https://docs.microsoft.com/en-us/azure/data-factory/data-factory-faq
https://docs.microsoft.com/en-us/azure/data-factory/data-factory-create-datasets