在 Azure 数据工厂中安排 U-SQL 作业

Schedule U-SQL jobs in Azure Data Factory

我遇到了以下问题。我想在以下时间安排三个 U-SQL 作业:每天 02:00UTC、03:00UTC 和 04:00UTC。我知道默认情况下,管道中的作业在 12:00AM UTC 执行,因此我所有的作业 运行 同时执行,这不是我想要的。

我红了documentation and it is written that I should consider offset parameter in dataset template. However when I try to set this the following error occurs:

我不知道如何设置不同于 12:00AM 运行U-SQL 作业的时间。您能提供一些有关如何正确执行此操作的信息吗?此外,我附上了我的数据集模板和管道:
数据集

{
"name": "TransformedData2",
"properties": {
    "published": false,
    "type": "AzureDataLakeStore",
    "linkedServiceName": "ADLstore_linkedService_scrapper",
    "typeProperties": {
        "fileName": "TestOutput2.csv",
        "folderPath": "transformedData/",
        "format": {
            "type": "TextFormat",
            "rowDelimiter": "\n",
            "columnDelimiter": ","
        }
    },
    "availability": {
        "frequency": "Day",
        "interval": 1,
        "style": "StartOfInterval"
    }
}

}

管道

{
"name": "filtering",
"properties": {
    "activities": [
        {
            "type": "DataLakeAnalyticsU-SQL",
            "typeProperties": {
                "scriptPath": "usqljobs\cleanStatements.txt",
                "scriptLinkedService": "AzureStorageLinkedService",
                "degreeOfParallelism": 5,
                "priority": 100,
                "parameters": {}
            },
            "outputs": [
                {
                    "name": "TransformedData2"
                }
            ],
            "scheduler": {
                "frequency": "Day",
                "interval": 1,
                "style": "StartOfInterval"
            },
            "name": "Brajan filtering",
            "linkedServiceName": "AzureDataLakeAnalyticsLinkedService"
        }
    ],
    "start": "2017-07-02T09:50:00Z",
    "end": "2018-06-30T03:00:00Z",
    "isPaused": false,
    "hubName": "datafactoryfin_hub",
    "pipelineMode": "Scheduled"
}

}

谢谢

使用 Offset 属性可能会有点混乱,因为您需要在数据集级别重新配置时间片。

作为替代方案,我建议对 activity 使用 Delay 属性。这提供了更多控制并且不需要重新配置时间片。

所以在你的 JSON...

{
"name": "filtering",
"properties": {
    "activities": [
        {
            "type": "DataLakeAnalyticsU-SQL",
            "typeProperties": {
                "scriptPath": "usqljobs\cleanStatements.txt",
                "scriptLinkedService": "AzureStorageLinkedService",
                "degreeOfParallelism": 5,
                "priority": 100,
                "parameters": {}
            },
            "outputs": [
                {
                    "name": "TransformedData2"
                }
            ],
            "policy": {
              "delay": "02:00:00" // <<<<< 2:00am start
            }, 
            "scheduler": {
                "frequency": "Day",
                "interval": 1,
                "style": "StartOfInterval"
            },
            "name": "Brajan filtering",
            "linkedServiceName": "AzureDataLakeAnalyticsLinkedService"
        }
    ],
    "start": "2017-07-02T09:50:00Z",
    "end": "2018-06-30T03:00:00Z",
    "isPaused": false,
    "hubName": "datafactoryfin_hub",
    "pipelineMode": "Scheduled"
}

那么您当然需要 3:00am 和 4:00am 版本的额外活动。

查看此 link 了解更多信息:

https://docs.microsoft.com/en-us/azure/data-factory/data-factory-scheduling-and-execution

在页面下方大约四分之一处提到了延迟。

希望对您有所帮助