Azure 数据工厂副本 activity 性能调整

Azure data factory copy activity performance tuning

https://docs.microsoft.com/en-us/azure/data-factory/data-factory-load-sql-data-warehouse。根据此 link 和 1000 DWU 和 polybase,我应该获得 200MBps 的吞吐量。但我得到 4.66 MBps。我在 xlargerc 资源 class 中添加了用户,以从 azure sql 数据仓库获得最佳吞吐量。

下面是管道 JSON。

                         {
              "name": "UCBPipeline-Copy",
                 "properties": {
                   "description": "pipeline with copy activity",
                 "activities": [
            {
                "type": "Copy",
                "typeProperties": {
                    "source": {
                        "type": "BlobSource"
                    },
                    "sink": {
                        "type": "SqlDWSink",
                        "allowPolyBase": true,
                        "writeBatchSize": 0,
                        "writeBatchTimeout": "00:00:00"
                    },
                    "cloudDataMovementUnits": 4
                },
                "inputs": [
                    {
                        "name": "USBBlob_Concept
                    }
                ],
                "outputs": [
                    {
                        "name": "AzureDW_Concept"
                    }
                ],
                "policy": {
                    "timeout": "01:00:00",
                    "concurrency": 1
                },
                "scheduler": {
                    "frequency": "Day",
                    "interval": 1
                },
                "name": "AzureBlobtoSQLDW_Concept",
                "description": "Copy Activity"
            }
        ],
        "start": "2017-02-28T18:00:00Z",
        "end": "2017-03-01T19:00:00Z",
        "isPaused": false,
        "hubName": "sampledf1_hub",
        "pipelineMode": "Scheduled"
    }
}

输入数据集:

{
    "name": "AzureBlob_Concept",
    "properties": {
        "published": false,
        "type": "AzureBlob",
        "linkedServiceName": "AzureZRSStorageLinkedService",
        "typeProperties": {
            "fileName": "conceptTab.txt",
            "folderPath": "source/",
            "format": {
                "type": "TextFormat",
                "columnDelimiter": "\t"
            }
        },
        "availability": {
            "frequency": "Day",
            "interval": 1
        },
        "external": true,
        "policy": {}
    }
}

输出数据集:

{
    "name": "AzureDW_Concept",
    "properties": {
        "published": false,
        "type": "AzureSqlDWTable",
        "linkedServiceName": "AzureSqlDWLinkedService",
        "typeProperties": {
            "tableName": "concept"
        },
        "availability": {
            "frequency": "Day",
            "interval": 1
        }
    }
}

配置中是否缺少任何内容?

我看了一下runId"e98ac557-a507-4a6e-8833-978eff1723c3",应该属于你的CopyActivity。从我们的服务日志来看,源文件不够大(在您的情况下为 270 MB),因此服务调用延迟会使吞吐量不够好。您可以尝试加载更大的文件以获得更好的吞吐量。