如何使用 nodejs 使用 Dynamodb 并行扫描检索数据
how to retrieve data using Dynamodb parallel scans using nodejs
我正在使用 nodejs 将 dynamo table 导入到 S3。我一切正常但整个副本非常慢,因为我有数百万条记录并且顺序扫描限制为 1MB,所以我正在寻找使用节点对发电机 dB 进行并行扫描。
但要做到这一点,我需要创建多个线程并将并行工作分配给节点中的这些线程。我想知道是否有推荐的方法,或者您是否建议使用 Data Pipeline 导入数据?你觉得我应该怎么做?
您可以使用 Scan API
中提供的 内置并行扫描 功能。整个将被分成多个段,并对各个段进行扫描。
Segment: 0,
TotalSegments: 5
Segment: For a parallel Scan request, Segment identifies an individual
segment to be scanned by an application worker.
Segment IDs are zero-based, so the first segment is always 0. For
example, if you want to use four application threads to scan a table
or an index, then the first thread specifies a Segment value of 0, the
second thread specifies 1, and so on.
The value for Segment must be greater than or equal to 0, and less
than the value provided for TotalSegments.
If you provide Segment, you must also provide TotalSegments.
TotalSegments:- For a parallel Scan request, TotalSegments represents
the total number of segments into which the Scan operation will be
divided. The value of TotalSegments corresponds to the number of
application workers that will perform the parallel scan. For example,
if you want to use four application threads to scan a table or an
index, specify a TotalSegments value of 4.
我正在使用 nodejs 将 dynamo table 导入到 S3。我一切正常但整个副本非常慢,因为我有数百万条记录并且顺序扫描限制为 1MB,所以我正在寻找使用节点对发电机 dB 进行并行扫描。
但要做到这一点,我需要创建多个线程并将并行工作分配给节点中的这些线程。我想知道是否有推荐的方法,或者您是否建议使用 Data Pipeline 导入数据?你觉得我应该怎么做?
您可以使用 Scan API
中提供的 内置并行扫描 功能。整个将被分成多个段,并对各个段进行扫描。
Segment: 0,
TotalSegments: 5
Segment: For a parallel Scan request, Segment identifies an individual segment to be scanned by an application worker.
Segment IDs are zero-based, so the first segment is always 0. For example, if you want to use four application threads to scan a table or an index, then the first thread specifies a Segment value of 0, the second thread specifies 1, and so on.
The value for Segment must be greater than or equal to 0, and less than the value provided for TotalSegments.
If you provide Segment, you must also provide TotalSegments.
TotalSegments:- For a parallel Scan request, TotalSegments represents the total number of segments into which the Scan operation will be divided. The value of TotalSegments corresponds to the number of application workers that will perform the parallel scan. For example, if you want to use four application threads to scan a table or an index, specify a TotalSegments value of 4.