为什么通过 LightIngest.exe 摄取会在队列中增加 5 分钟的延迟?

Why does ingesting through LightIngest.exe add 5 minutes delay in the queue?

我正在摄取一个只有 183 KB 的本地镶木地板文件。

PS C:\Users\user\Downloads\microsoft.azure.kusto.tools.5.4.2\tools> .\LightIngest.exe `
  "https://ingest-adx.centralus.kusto.windows.net/;Fed=True" `
  -database:"wd" -table:"test" `
  -source:"C:\Users\user\Downloads\exp\test" -pattern:"*.parquet" -format:"parquet"
LightIngest invoked with the following arguments: https://ingest-adx.centralus.kusto.windows.net/;Fed=True -database:wd -table:test -source:C:\Users\user\Downloads\exp\test -pattern:*.parquet -format:parquet

Please review the run parameters:

    Connection string      : https://ingest-adx.centralus.kusto.windows.net/;Fed=True
    -database              : wd
    -table                 : test

    -sourcePath            : C:\Users\user\Downloads\exp\test
    -pattern               : *.parquet
    -creationTimePattern   :
    -format                : parquet
    -ignoreFirstRow        : False

    -compression           : 10
    -ingestTimeout (min)   : 60
    -dontWait              : False


Press [Ctrl+Q] to abort, press any other key or wait for 10 seconds to proceed
==> Starting...
ListAndFilterFiles: enumerating files under 'C:\Users\user\Downloads\exp\test'
==> Items discovered: [      1], filtered: [      1], posted for ingestion: [      1]
    Done. Time elapsed: 00:00:02.1317334
    Items discovered: [      1], filtered: [      1], posted for ingestion: [      1]
==> Waiting for ingestion completion...
==> Waiting for ingest operation(s) completion (will timeout after 60 minutes)...
==> [      0] out of [      1] ingest operations completed. Time elapsed: 00:00:00.2823859
==> [      0] out of [      1] ingest operations completed. Time elapsed: 00:00:30.4223658
==> [      0] out of [      1] ingest operations completed. Time elapsed: 00:01:00.5649914
==> [      0] out of [      1] ingest operations completed. Time elapsed: 00:01:30.7049284
==> [      0] out of [      1] ingest operations completed. Time elapsed: 00:02:00.8459706
==> [      0] out of [      1] ingest operations completed. Time elapsed: 00:02:30.9859844
==> [      0] out of [      1] ingest operations completed. Time elapsed: 00:03:01.1265852
==> [      0] out of [      1] ingest operations completed. Time elapsed: 00:03:31.2669361
==> [      0] out of [      1] ingest operations completed. Time elapsed: 00:04:01.4074579
==> [      0] out of [      1] ingest operations completed. Time elapsed: 00:04:31.5469210
==> [      0] out of [      1] ingest operations completed. Time elapsed: 00:05:01.6859146
==> [      1] out of [      1] ingest operations completed. Time elapsed: 00:05:31.8246008
    Successfully completed [      1] out of [      1] ingest operations.
==> Done.

记下摄入的时间。在 ADX 中查询目的地 table 表明摄取确实延迟了。它在 Time elapsed: 00:05:01... 后不久开始,因此在下一次检查时结束。

如果我将 parquet 文件放在存储帐户上并使用类似

的语法,则会发生完全相同的事情
`-source:https://{storage_account}.blob.core.windows.net/container?{SAS} -prefix:exp/test`

又延迟了5分钟。

如果我从存储帐户启动摄取为

.ingest into table test
(h'https://{storage_account}.blob.core.windows.net/container/exp/test/my_file.parquet?{SAS}')
with (format='parquet')

它立即开始并需要几毫秒才能完成。

请协助,为什么会出现这种延迟以及如何解决?如此大的延迟导致 LightIngest.exe 无法使用。

摄取服务中有一个内置的批处理阶段,可以减少摄取过程的资源消耗。它默认为 5 分钟或 1GB 的数据。 这可以在数据库或 table 级别上进行控制和覆盖。 有关详细信息,请参阅 IngestionBatchingPolicy 文章。 请记住,当您更改此策略时,摄取服务将需要几分钟时间来获取更改。