ADF 在 Source 中使用下沉缓存？

Question

我已将数据下沉到 Azure 数据工厂映射数据流中的缓存中，并希望在新源的容器和通配符路径中使用来自下沉的数据。这可能吗？

在源代码中，我尝试将 SourceExpressionCache#outputs()[1].SourceDirectory 添加到容器中：

错误：

Spark job failed: {
"text/plain": "{\"runId\":\"74358980-da2a-490e-b193-352c20fb48e3\",\"sessionId\":\"920b092c-3def-4850-b4d4-5e87e64c4619\",\"status\":\"Failed\",\"payload\":{\"statusCode\":400,\"shortMessage\":\"DF-DRAFT_001 at Source 'GenericDocument'(Line 21/Col 0): Unresolved specification\",\"detailedMessage\":\"Failure 2022-03-17 06:52:57.757 failed DebugManager.processJob, run=74358980-da2a-490e-b193-352c20fb48e3, errorMessage=DF-DRAFT_001 at Source 'GenericDocument'(Line 21/Col 0): Unresolved specification\"}}\n"
} - RunId: 74358980-da2a-490e-b193-352c20fb48e3

Answer 1

我已经使用带有示例数据的缓存接收器进行了反驳。请参阅以下步骤。

在缓存接收器设置中，当您使用缓存查找的 output() 函数时，不要提供键列。

映射：

在新源中，我将缓存输出作为 folder/directory 传递以读取新文件。
在表达式构建器中，缓存查找允许使用 2 个函数 lookup() 和 outputs().

outputs() takes no parameters and returns the entire cache sink as an array of complex columns. This can't be called if key columns are specified in the sink and should only be used if there is a small number of rows in the cache sink.

表达式：sink1#outputs()[1].Value

输出：新源从缓存接收器值的文件夹中读取数据。

ADF 在 Source 中使用下沉缓存？

ADF using sinked cache in Source?

azure-data-factory