在 Azure 数据工厂中安排 U-SQL 作业
Schedule U-SQL jobs in Azure Data Factory
我遇到了以下问题。我想在以下时间安排三个 U-SQL 作业:每天 02:00UTC、03:00UTC 和 04:00UTC。我知道默认情况下,管道中的作业在 12:00AM UTC 执行,因此我所有的作业 运行 同时执行,这不是我想要的。
我红了documentation and it is written that I should consider offset parameter in dataset template. However when I try to set this the following error occurs: 。
我不知道如何设置不同于 12:00AM 运行U-SQL 作业的时间。您能提供一些有关如何正确执行此操作的信息吗?此外,我附上了我的数据集模板和管道:
数据集
{
"name": "TransformedData2",
"properties": {
"published": false,
"type": "AzureDataLakeStore",
"linkedServiceName": "ADLstore_linkedService_scrapper",
"typeProperties": {
"fileName": "TestOutput2.csv",
"folderPath": "transformedData/",
"format": {
"type": "TextFormat",
"rowDelimiter": "\n",
"columnDelimiter": ","
}
},
"availability": {
"frequency": "Day",
"interval": 1,
"style": "StartOfInterval"
}
}
}
管道
{
"name": "filtering",
"properties": {
"activities": [
{
"type": "DataLakeAnalyticsU-SQL",
"typeProperties": {
"scriptPath": "usqljobs\cleanStatements.txt",
"scriptLinkedService": "AzureStorageLinkedService",
"degreeOfParallelism": 5,
"priority": 100,
"parameters": {}
},
"outputs": [
{
"name": "TransformedData2"
}
],
"scheduler": {
"frequency": "Day",
"interval": 1,
"style": "StartOfInterval"
},
"name": "Brajan filtering",
"linkedServiceName": "AzureDataLakeAnalyticsLinkedService"
}
],
"start": "2017-07-02T09:50:00Z",
"end": "2018-06-30T03:00:00Z",
"isPaused": false,
"hubName": "datafactoryfin_hub",
"pipelineMode": "Scheduled"
}
}
谢谢
使用 Offset 属性可能会有点混乱,因为您需要在数据集级别重新配置时间片。
作为替代方案,我建议对 activity 使用 Delay 属性。这提供了更多控制并且不需要重新配置时间片。
所以在你的 JSON...
{
"name": "filtering",
"properties": {
"activities": [
{
"type": "DataLakeAnalyticsU-SQL",
"typeProperties": {
"scriptPath": "usqljobs\cleanStatements.txt",
"scriptLinkedService": "AzureStorageLinkedService",
"degreeOfParallelism": 5,
"priority": 100,
"parameters": {}
},
"outputs": [
{
"name": "TransformedData2"
}
],
"policy": {
"delay": "02:00:00" // <<<<< 2:00am start
},
"scheduler": {
"frequency": "Day",
"interval": 1,
"style": "StartOfInterval"
},
"name": "Brajan filtering",
"linkedServiceName": "AzureDataLakeAnalyticsLinkedService"
}
],
"start": "2017-07-02T09:50:00Z",
"end": "2018-06-30T03:00:00Z",
"isPaused": false,
"hubName": "datafactoryfin_hub",
"pipelineMode": "Scheduled"
}
那么您当然需要 3:00am 和 4:00am 版本的额外活动。
查看此 link 了解更多信息:
https://docs.microsoft.com/en-us/azure/data-factory/data-factory-scheduling-and-execution
在页面下方大约四分之一处提到了延迟。
希望对您有所帮助
我遇到了以下问题。我想在以下时间安排三个 U-SQL 作业:每天 02:00UTC、03:00UTC 和 04:00UTC。我知道默认情况下,管道中的作业在 12:00AM UTC 执行,因此我所有的作业 运行 同时执行,这不是我想要的。
我红了documentation and it is written that I should consider offset parameter in dataset template. However when I try to set this the following error occurs:
我不知道如何设置不同于 12:00AM 运行U-SQL 作业的时间。您能提供一些有关如何正确执行此操作的信息吗?此外,我附上了我的数据集模板和管道:
数据集
{
"name": "TransformedData2",
"properties": {
"published": false,
"type": "AzureDataLakeStore",
"linkedServiceName": "ADLstore_linkedService_scrapper",
"typeProperties": {
"fileName": "TestOutput2.csv",
"folderPath": "transformedData/",
"format": {
"type": "TextFormat",
"rowDelimiter": "\n",
"columnDelimiter": ","
}
},
"availability": {
"frequency": "Day",
"interval": 1,
"style": "StartOfInterval"
}
}
}
管道
{
"name": "filtering",
"properties": {
"activities": [
{
"type": "DataLakeAnalyticsU-SQL",
"typeProperties": {
"scriptPath": "usqljobs\cleanStatements.txt",
"scriptLinkedService": "AzureStorageLinkedService",
"degreeOfParallelism": 5,
"priority": 100,
"parameters": {}
},
"outputs": [
{
"name": "TransformedData2"
}
],
"scheduler": {
"frequency": "Day",
"interval": 1,
"style": "StartOfInterval"
},
"name": "Brajan filtering",
"linkedServiceName": "AzureDataLakeAnalyticsLinkedService"
}
],
"start": "2017-07-02T09:50:00Z",
"end": "2018-06-30T03:00:00Z",
"isPaused": false,
"hubName": "datafactoryfin_hub",
"pipelineMode": "Scheduled"
}
}
谢谢
使用 Offset 属性可能会有点混乱,因为您需要在数据集级别重新配置时间片。
作为替代方案,我建议对 activity 使用 Delay 属性。这提供了更多控制并且不需要重新配置时间片。
所以在你的 JSON...
{
"name": "filtering",
"properties": {
"activities": [
{
"type": "DataLakeAnalyticsU-SQL",
"typeProperties": {
"scriptPath": "usqljobs\cleanStatements.txt",
"scriptLinkedService": "AzureStorageLinkedService",
"degreeOfParallelism": 5,
"priority": 100,
"parameters": {}
},
"outputs": [
{
"name": "TransformedData2"
}
],
"policy": {
"delay": "02:00:00" // <<<<< 2:00am start
},
"scheduler": {
"frequency": "Day",
"interval": 1,
"style": "StartOfInterval"
},
"name": "Brajan filtering",
"linkedServiceName": "AzureDataLakeAnalyticsLinkedService"
}
],
"start": "2017-07-02T09:50:00Z",
"end": "2018-06-30T03:00:00Z",
"isPaused": false,
"hubName": "datafactoryfin_hub",
"pipelineMode": "Scheduled"
}
那么您当然需要 3:00am 和 4:00am 版本的额外活动。
查看此 link 了解更多信息:
https://docs.microsoft.com/en-us/azure/data-factory/data-factory-scheduling-and-execution
在页面下方大约四分之一处提到了延迟。
希望对您有所帮助