复制活动(从 Cosmos SQL api 到 ADLS gen2)在 Synapse 中失败

Copy activitiy (from Cosmos SQL api to ADLS gen2) getting failed in Synapse

我正在尝试 运行 将数据从 Cosmos (SQL API) 复制到 ADLS gen2 的管道,用于多个 tables。 Lookup Activity 正在 ForEach 中传递查询列表和 Copy Activity 运行s,使用自托管 IR。 但是它在第一次迭代后一直失败并出现以下错误:

Operation on target Copy data1_copy1 failed: Failure happened on 'Sink' side. ErrorCode=UserErrorFailedFileOperation,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Upload file failed at path tfs/OU Cosmos Data/LATAM/fact\dl-br-prod.,Source=Microsoft.DataTransfer.Common,''Type=Microsoft.Azure.Documents.RequestTimeoutException,Message=Request timed out.

此外,我确定这不是任何特定 table 的问题,因为我尝试以不同的顺序传递查询,在每次尝试中,第一个查询成功完成,其余的迭代复制 Activity 运行s 一段时间,最终失败。

到目前为止我已经尝试过以下操作:

  1. 运行 ForEach 顺序模式
  2. 正在将 Sink 端的 块大小(以 MB 为单位) 更改为 20MB。默认为 100MB

您能否查看官方 MS 文档中建议的解决方法,因为这涉及自托管 IR。

Request to Azure Data Lake Storage Gen2 account caused a timeout error

原因:该问题是由 Azure Data Lake Storage Gen2 接收器超时错误引起的,该错误通常发生在自托管集成运行时 (IR) 计算机上。

建议:

  1. Place your Self-hosted IR machine and target Azure Data Lake Storage Gen2 account in the same region, if possible. This can help avoid a random timeout error and produce better performance.

  2. Check whether there's a special network setting, such as ExpressRoute, and ensure that the network has enough bandwidth. We suggest that you lower the Self-hosted IR concurrent jobs setting when the overall bandwidth is low. Doing so can help avoid network resource competition across multiple concurrent jobs.

  3. If the file size is moderate or small, use a smaller block size for nonbinary copy to mitigate such a timeout error. For more information, see Blob Storage Put Block

我得到了 Microsoft Cosmos 产品团队的回复:

Root cause:

The SDK client is configured with some Timeout value and the request is taking longer time.

Reason for the timeouts is an increase in Gateway latency (Gateway has no latency SLA) due to large result size. This is probably expected (more data tends longer to be read, sent, and received).

Resolution:

Increase the RequestTimeout used in the client.

The team owning the Synapse Data Transfer (which uses the .NET 2.5.1 SDK and owns the Microsoft.DataTransfer aplication) can increase the RequestTimeout used on the .NET SDK to a higher value. In newer SDK versions, this value is 65 seconds by default.

尽管我们选择完全绕过这条路线并包括 SynapseLink 或专用端点。