为什么通过 LightIngest.exe 摄取会在队列中增加 5 分钟的延迟?
Why does ingesting through LightIngest.exe add 5 minutes delay in the queue?
我正在摄取一个只有 183 KB 的本地镶木地板文件。
PS C:\Users\user\Downloads\microsoft.azure.kusto.tools.5.4.2\tools> .\LightIngest.exe `
"https://ingest-adx.centralus.kusto.windows.net/;Fed=True" `
-database:"wd" -table:"test" `
-source:"C:\Users\user\Downloads\exp\test" -pattern:"*.parquet" -format:"parquet"
LightIngest invoked with the following arguments: https://ingest-adx.centralus.kusto.windows.net/;Fed=True -database:wd -table:test -source:C:\Users\user\Downloads\exp\test -pattern:*.parquet -format:parquet
Please review the run parameters:
Connection string : https://ingest-adx.centralus.kusto.windows.net/;Fed=True
-database : wd
-table : test
-sourcePath : C:\Users\user\Downloads\exp\test
-pattern : *.parquet
-creationTimePattern :
-format : parquet
-ignoreFirstRow : False
-compression : 10
-ingestTimeout (min) : 60
-dontWait : False
Press [Ctrl+Q] to abort, press any other key or wait for 10 seconds to proceed
==> Starting...
ListAndFilterFiles: enumerating files under 'C:\Users\user\Downloads\exp\test'
==> Items discovered: [ 1], filtered: [ 1], posted for ingestion: [ 1]
Done. Time elapsed: 00:00:02.1317334
Items discovered: [ 1], filtered: [ 1], posted for ingestion: [ 1]
==> Waiting for ingestion completion...
==> Waiting for ingest operation(s) completion (will timeout after 60 minutes)...
==> [ 0] out of [ 1] ingest operations completed. Time elapsed: 00:00:00.2823859
==> [ 0] out of [ 1] ingest operations completed. Time elapsed: 00:00:30.4223658
==> [ 0] out of [ 1] ingest operations completed. Time elapsed: 00:01:00.5649914
==> [ 0] out of [ 1] ingest operations completed. Time elapsed: 00:01:30.7049284
==> [ 0] out of [ 1] ingest operations completed. Time elapsed: 00:02:00.8459706
==> [ 0] out of [ 1] ingest operations completed. Time elapsed: 00:02:30.9859844
==> [ 0] out of [ 1] ingest operations completed. Time elapsed: 00:03:01.1265852
==> [ 0] out of [ 1] ingest operations completed. Time elapsed: 00:03:31.2669361
==> [ 0] out of [ 1] ingest operations completed. Time elapsed: 00:04:01.4074579
==> [ 0] out of [ 1] ingest operations completed. Time elapsed: 00:04:31.5469210
==> [ 0] out of [ 1] ingest operations completed. Time elapsed: 00:05:01.6859146
==> [ 1] out of [ 1] ingest operations completed. Time elapsed: 00:05:31.8246008
Successfully completed [ 1] out of [ 1] ingest operations.
==> Done.
记下摄入的时间。在 ADX 中查询目的地 table 表明摄取确实延迟了。它在 Time elapsed: 00:05:01...
后不久开始,因此在下一次检查时结束。
如果我将 parquet 文件放在存储帐户上并使用类似
的语法,则会发生完全相同的事情
`-source:https://{storage_account}.blob.core.windows.net/container?{SAS} -prefix:exp/test`
又延迟了5分钟。
如果我从存储帐户启动摄取为
.ingest into table test
(h'https://{storage_account}.blob.core.windows.net/container/exp/test/my_file.parquet?{SAS}')
with (format='parquet')
它立即开始并需要几毫秒才能完成。
请协助,为什么会出现这种延迟以及如何解决?如此大的延迟导致 LightIngest.exe
无法使用。
摄取服务中有一个内置的批处理阶段,可以减少摄取过程的资源消耗。它默认为 5 分钟或 1GB 的数据。
这可以在数据库或 table 级别上进行控制和覆盖。
有关详细信息,请参阅 IngestionBatchingPolicy 文章。
请记住,当您更改此策略时,摄取服务将需要几分钟时间来获取更改。
我正在摄取一个只有 183 KB 的本地镶木地板文件。
PS C:\Users\user\Downloads\microsoft.azure.kusto.tools.5.4.2\tools> .\LightIngest.exe `
"https://ingest-adx.centralus.kusto.windows.net/;Fed=True" `
-database:"wd" -table:"test" `
-source:"C:\Users\user\Downloads\exp\test" -pattern:"*.parquet" -format:"parquet"
LightIngest invoked with the following arguments: https://ingest-adx.centralus.kusto.windows.net/;Fed=True -database:wd -table:test -source:C:\Users\user\Downloads\exp\test -pattern:*.parquet -format:parquet
Please review the run parameters:
Connection string : https://ingest-adx.centralus.kusto.windows.net/;Fed=True
-database : wd
-table : test
-sourcePath : C:\Users\user\Downloads\exp\test
-pattern : *.parquet
-creationTimePattern :
-format : parquet
-ignoreFirstRow : False
-compression : 10
-ingestTimeout (min) : 60
-dontWait : False
Press [Ctrl+Q] to abort, press any other key or wait for 10 seconds to proceed
==> Starting...
ListAndFilterFiles: enumerating files under 'C:\Users\user\Downloads\exp\test'
==> Items discovered: [ 1], filtered: [ 1], posted for ingestion: [ 1]
Done. Time elapsed: 00:00:02.1317334
Items discovered: [ 1], filtered: [ 1], posted for ingestion: [ 1]
==> Waiting for ingestion completion...
==> Waiting for ingest operation(s) completion (will timeout after 60 minutes)...
==> [ 0] out of [ 1] ingest operations completed. Time elapsed: 00:00:00.2823859
==> [ 0] out of [ 1] ingest operations completed. Time elapsed: 00:00:30.4223658
==> [ 0] out of [ 1] ingest operations completed. Time elapsed: 00:01:00.5649914
==> [ 0] out of [ 1] ingest operations completed. Time elapsed: 00:01:30.7049284
==> [ 0] out of [ 1] ingest operations completed. Time elapsed: 00:02:00.8459706
==> [ 0] out of [ 1] ingest operations completed. Time elapsed: 00:02:30.9859844
==> [ 0] out of [ 1] ingest operations completed. Time elapsed: 00:03:01.1265852
==> [ 0] out of [ 1] ingest operations completed. Time elapsed: 00:03:31.2669361
==> [ 0] out of [ 1] ingest operations completed. Time elapsed: 00:04:01.4074579
==> [ 0] out of [ 1] ingest operations completed. Time elapsed: 00:04:31.5469210
==> [ 0] out of [ 1] ingest operations completed. Time elapsed: 00:05:01.6859146
==> [ 1] out of [ 1] ingest operations completed. Time elapsed: 00:05:31.8246008
Successfully completed [ 1] out of [ 1] ingest operations.
==> Done.
记下摄入的时间。在 ADX 中查询目的地 table 表明摄取确实延迟了。它在 Time elapsed: 00:05:01...
后不久开始,因此在下一次检查时结束。
如果我将 parquet 文件放在存储帐户上并使用类似
的语法,则会发生完全相同的事情`-source:https://{storage_account}.blob.core.windows.net/container?{SAS} -prefix:exp/test`
又延迟了5分钟。
如果我从存储帐户启动摄取为
.ingest into table test
(h'https://{storage_account}.blob.core.windows.net/container/exp/test/my_file.parquet?{SAS}')
with (format='parquet')
它立即开始并需要几毫秒才能完成。
请协助,为什么会出现这种延迟以及如何解决?如此大的延迟导致 LightIngest.exe
无法使用。
摄取服务中有一个内置的批处理阶段,可以减少摄取过程的资源消耗。它默认为 5 分钟或 1GB 的数据。 这可以在数据库或 table 级别上进行控制和覆盖。 有关详细信息,请参阅 IngestionBatchingPolicy 文章。 请记住,当您更改此策略时,摄取服务将需要几分钟时间来获取更改。