在 Azure 数据工厂 V2 上构建此管道
Building this pipeline on Azure Data Factory V2
我目前正在尝试在 Azure Data Factory V2 上设置此管道(如您在附图中所见)。总之,该 ERP 系统将每月导出此报告(包含实际和预测数据的 CSV 文件),并将其保存在 blob 容器中。保存此文件 CSV 后,事件触发器应激活此存储过程,该存储过程将 - 反过来 - 从我在 Azure SQL 中的事实 table 中删除所有实际数据,因为它每个月都会被替换。
一旦实际数据被删除,管道将随后有一个副本 activity,该副本将反过来将 CSV 报告(实际数据 + 预测)复制到 Azure table 中的相同事实 table 27=]。复制 activity 完成后,HTTP 逻辑应用程序将从 blob 容器中删除新的 CSV 文件。该工作流将是每月执行一次的经常性事件。
到目前为止,我已经能够 运行 独立完成这 3 项活动。但是,当我将它们加入同一个管道时,我在尝试 "publish all" 时遇到了一些参数错误。因此我不确定我是否需要为管道中的每个 activity 设置相同的参数?
我的管道的 JSON 代码如下:
{
"name": "TM1_pipeline",
"properties": {
"activities": [
{
"name": "Copy Data1",
"type": "Copy",
"dependsOn": [
{
"activity": "Stored Procedure1",
"dependencyConditions": [
"Succeeded"
]
}
],
"policy": {
"timeout": "7.00:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false
},
"typeProperties": {
"source": {
"type": "BlobSource",
"recursive": false
},
"sink": {
"type": "SqlSink",
"writeBatchSize": 10000
},
"enableStaging": false,
"dataIntegrationUnits": 0
},
"inputs": [
{
"referenceName": "SourceDataset_e7y",
"type": "DatasetReference",
"parameters": {
"copyFolder": {
"value": "@pipeline().parameters.sourceFolder",
"type": "Expression"
},
"copyFile": {
"value": "@pipeline().parameters.sourceFile",
"type": "Expression"
}
}
}
],
"outputs": [
{
"referenceName": "DestinationDataset_e7y",
"type": "DatasetReference"
}
]
},
{
"name": "Stored Procedure1",
"type": "SqlServerStoredProcedure",
"policy": {
"timeout": "7.00:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"typeProperties": {
"storedProcedureName": "[dbo].[test_sp]"
},
"linkedServiceName": {
"referenceName": "AzureSqlDatabase",
"type": "LinkedServiceReference"
}
},
{
"name": "Web1",
"type": "WebActivity",
"dependsOn": [
{
"activity": "Copy Data1",
"dependencyConditions": [
"Succeeded"
]
}
],
"policy": {
"timeout": "7.00:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"typeProperties": {
"url": "...",
"method": "POST",
"body": {
"value": "@pipeline().parameters.BlobName",
"type": "Expression"
}
}
}
],
"parameters": {
"sourceFolder": {
"type": "String",
"defaultValue": "@pipeline().parameters.sourceFolder"
},
"sourceFile": {
"type": "String",
"defaultValue": "@pipeline().parameters.sourceFile"
},
"BlobName": {
"type": "String",
"defaultValue": {
"blobname": "source-csv/test.csv"
}
}
}
},
"type": "Microsoft.DataFactory/factories/pipelines"
}
请按照 this doc 配置您的 blob 事件触发器并将正确的值传递给您的参数。
我目前正在尝试在 Azure Data Factory V2 上设置此管道(如您在附图中所见)。总之,该 ERP 系统将每月导出此报告(包含实际和预测数据的 CSV 文件),并将其保存在 blob 容器中。保存此文件 CSV 后,事件触发器应激活此存储过程,该存储过程将 - 反过来 - 从我在 Azure SQL 中的事实 table 中删除所有实际数据,因为它每个月都会被替换。
一旦实际数据被删除,管道将随后有一个副本 activity,该副本将反过来将 CSV 报告(实际数据 + 预测)复制到 Azure table 中的相同事实 table 27=]。复制 activity 完成后,HTTP 逻辑应用程序将从 blob 容器中删除新的 CSV 文件。该工作流将是每月执行一次的经常性事件。
到目前为止,我已经能够 运行 独立完成这 3 项活动。但是,当我将它们加入同一个管道时,我在尝试 "publish all" 时遇到了一些参数错误。因此我不确定我是否需要为管道中的每个 activity 设置相同的参数?
我的管道的 JSON 代码如下:
{
"name": "TM1_pipeline",
"properties": {
"activities": [
{
"name": "Copy Data1",
"type": "Copy",
"dependsOn": [
{
"activity": "Stored Procedure1",
"dependencyConditions": [
"Succeeded"
]
}
],
"policy": {
"timeout": "7.00:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false
},
"typeProperties": {
"source": {
"type": "BlobSource",
"recursive": false
},
"sink": {
"type": "SqlSink",
"writeBatchSize": 10000
},
"enableStaging": false,
"dataIntegrationUnits": 0
},
"inputs": [
{
"referenceName": "SourceDataset_e7y",
"type": "DatasetReference",
"parameters": {
"copyFolder": {
"value": "@pipeline().parameters.sourceFolder",
"type": "Expression"
},
"copyFile": {
"value": "@pipeline().parameters.sourceFile",
"type": "Expression"
}
}
}
],
"outputs": [
{
"referenceName": "DestinationDataset_e7y",
"type": "DatasetReference"
}
]
},
{
"name": "Stored Procedure1",
"type": "SqlServerStoredProcedure",
"policy": {
"timeout": "7.00:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"typeProperties": {
"storedProcedureName": "[dbo].[test_sp]"
},
"linkedServiceName": {
"referenceName": "AzureSqlDatabase",
"type": "LinkedServiceReference"
}
},
{
"name": "Web1",
"type": "WebActivity",
"dependsOn": [
{
"activity": "Copy Data1",
"dependencyConditions": [
"Succeeded"
]
}
],
"policy": {
"timeout": "7.00:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"typeProperties": {
"url": "...",
"method": "POST",
"body": {
"value": "@pipeline().parameters.BlobName",
"type": "Expression"
}
}
}
],
"parameters": {
"sourceFolder": {
"type": "String",
"defaultValue": "@pipeline().parameters.sourceFolder"
},
"sourceFile": {
"type": "String",
"defaultValue": "@pipeline().parameters.sourceFile"
},
"BlobName": {
"type": "String",
"defaultValue": {
"blobname": "source-csv/test.csv"
}
}
}
},
"type": "Microsoft.DataFactory/factories/pipelines"
}
请按照 this doc 配置您的 blob 事件触发器并将正确的值传递给您的参数。