dsbulk 卸载在大型 table 上失败

dsbulk unload is failing on large table

试图从一个巨大的table中卸载数据,下面是使用的命令和输出。

$ /home/cassandra/dsbulk-1.8.0/bin/dsbulk 卸载 --driver.auth.provider PlainTextAuthProvider --driver.auth.username xxxx --driver.auth.password xxxx --datastax-java-driver.basic.contact-points 123.123.123.123 -query "select count(*) from sometable with where on clustering column and partial pk -- allow filtering" --connector.name json --driver.protocol.compression LZ4 --connector.json.mode MULTI_DOCUMENT -maxConcurrentFiles 1 -maxRecords -1 -url dsbulk --executor.continuousPaging.enabled false --executor.maxpersecond 2500 --driver.socket.timeout 240000

Setting dsbulk.driver.protocol.compression is deprecated and will be removed in a future release; please configure the driver directly using --datastax-java-driver.advanced.protocol.compression instead.
Setting dsbulk.driver.auth.* is deprecated and will be removed in a future release; please configure the driver directly using --datastax-java-driver.advanced.auth-provider.* instead.
Operation directory: /home/cassandra/logs/COUNT_20210423-070104-108326
total | failed | rows/s |      p50ms |      p99ms |     p999ms
    1 |      1 |      0 | 109,790.10 | 110,058.54 | 110,058.54
Operation COUNT_20210423-070104-108326 completed with 1 errors in 1 minute and 50 seconds.

这里是 dsbulk 记录 --

cassandra@somehost> cd logs
cassandra@somehost> cd COUNT_20210423-070104-108326/
cassandra@somehost> ls
operation.log  unload-errors.log
cassandra@somehost> cat operation.log
2021-04-23 07:01:04 WARN  Setting dsbulk.driver.protocol.compression is deprecated and will be removed in a future release; please configure the driver directly using --datastax-java-driver.advanced.protocol.compression instead.
2021-04-23 07:01:04 WARN  Setting dsbulk.driver.auth.* is deprecated and will be removed in a future release; please configure the driver directly using --datastax-java-driver.advanced.auth-provider.* instead.
2021-04-23 07:01:04 INFO  Operation directory: /home/cassandra/logs/COUNT_20210423-070104-108326
2021-04-23 07:02:55 WARN  Operation COUNT_20210423-070104-108326 completed with 1 errors in 1 minute and 50 seconds.
2021-04-23 07:02:55 INFO  Records: total: 1, successful: 0, failed: 1
2021-04-23 07:02:55 INFO  Memory usage: used: 212 MB, free: 1,922 MB, allocated: 2,135 MB, available: 27,305 MB, total gc count: 4, total gc time: 98 ms
2021-04-23 07:02:55 INFO  Reads: total: 1, successful: 0, failed: 1, in-flight: 0
2021-04-23 07:02:55 INFO  Throughput: 0 reads/second
2021-04-23 07:02:55 INFO  Latencies: mean 109,790.10, 75p 110,058.54, 99p 110,058.54, 999p 110,058.54 milliseconds
2021-04-23 07:02:58 INFO  Final stats:
2021-04-23 07:02:58 INFO  Records: total: 1, successful: 0, failed: 1
2021-04-23 07:02:58 INFO  Memory usage: used: 251 MB, free: 1,883 MB, allocated: 2,135 MB, available: 27,305 MB, total gc count: 4, total gc time: 98 ms
2021-04-23 07:02:58 INFO  Reads: total: 1, successful: 0, failed: 1, in-flight: 0
2021-04-23 07:02:58 INFO  Throughput: 0 reads/second
2021-04-23 07:02:58 INFO  Latencies: mean 109,790.10, 75p 110,058.54, 99p 110,058.54, 999p 110,058.54 milliseconds

cassandra@somehost> cat unload-errors.log
Statement: com.datastax.oss.driver.internal.core.cql.DefaultBoundStatement@1083fef9 [0 values, idempotence: <UNSET>, CL: <UNSET>, serial CL: <UNSET>, timestamp: <UNSET>, timeout: <UNSET>]
SELECT batch_id from .... allow filtering (Cassandra timeout during read query at consistency LOCAL_ONE (1 responses were required but only 0 replica responded))
        at com.datastax.oss.dsbulk.executor.api.subscription.ResultSubscription.toErrorPage(ResultSubscription.java:534)
        at com.datastax.oss.dsbulk.executor.api.subscription.ResultSubscription.lambda$fetchNextPage(ResultSubscription.java:372)
        at com.datastax.oss.driver.internal.core.cql.CqlRequestHandler.setFinalError(CqlRequestHandler.java:447) [4 skipped]
        at com.datastax.oss.driver.internal.core.cql.CqlRequestHandler.access0(CqlRequestHandler.java:94)
        at com.datastax.oss.driver.internal.core.cql.CqlRequestHandler$NodeResponseCallback.processRetryVerdict(CqlRequestHandler.java:859)
        at com.datastax.oss.driver.internal.core.cql.CqlRequestHandler$NodeResponseCallback.processErrorResponse(CqlRequestHandler.java:828)
        at com.datastax.oss.driver.internal.core.cql.CqlRequestHandler$NodeResponseCallback.onResponse(CqlRequestHandler.java:655)
        at com.datastax.oss.driver.internal.core.channel.InFlightHandler.channelRead(InFlightHandler.java:257)
        at java.lang.Thread.run(Thread.java:748) [24 skipped]
Caused by: com.datastax.oss.driver.api.core.servererrors.ReadTimeoutException: Cassandra timeout during read query at consistency LOCAL_ONE (1 responses were required but only 0 replica responded)

Cassandra 的 system.log 片段 ----

DEBUG [ScheduledTasks:1] 2021-04-23 00:01:48,539  MonitoringTask.java:152 - 1 operations timed out in the last 5015 msecs:
<SELECT * FROM my query being run with limit - LIMIT 5000>, total time 10004 msec, timeout 10000 msec/cross-node
INFO  [ScheduledTasks:1] 2021-04-23 00:02:38,540  MessagingService.java:1302 - RANGE_SLICE messages were dropped in last 5000 ms: 0 internal and 1 cross node
. Mean internal dropped latency: 0 ms and Mean cross-node dropped latency: 10299 ms
INFO  [ScheduledTasks:1] 2021-04-23 00:02:38,551  StatusLogger.java:114 -
Pool Name                    Active   Pending      Completed   Blocked  All Time Blocked
ReadStage                         1         0     1736872997         0                 0
ContinuousPagingStage             0         0            586         0                 0
RequestResponseStage              0         0     1483193130         0                 0
ReadRepairStage                   0         0        9079516         0                 0
CounterMutationStage              0         0              0         0                 0
MutationStage                     0         0      351841038         0                 0
ViewMutationStage                 0         0              0         0                 0
CommitLogArchiver                 0         0          32961         0                 0
MiscStage                         0         0              0         0                 0
CompactionExecutor                0         0       12034828         0                 0
MemtableReclaimMemory             0         0          68612         0                 0
PendingRangeCalculator            0         0              9         0                 0
AntiCompactionExecutor            0         0              0         0                 0
GossipStage                       0         0       20137208         0                 0
SecondaryIndexManagement          0         0              0         0                 0
HintsDispatcher                   0         0           3798         0                 0
MigrationStage                    0         0              8         0                 0
MemtablePostFlush                 0         0         338955         0                 0
PerDiskMemtableFlushWriter_0         0         0          66297         0                 0
ValidationExecutor                0         0         247600         0                 0
Sampler                           0         0              0         0                 0
MemtableFlushWriter               0         0          41757         0                 0
InternalResponseStage             0         0         525242         0                 0
AntiEntropyStage                  0         0         767527         0                 0
CacheCleanupExecutor              0         0              0         0                 0
Native-Transport-Requests         0         0      958717934         0                65
CompactionManager                 0         0
MessagingService                n/a       0/0
Cache Type                     Size                 Capacity               KeysToSave
KeyCache                  104857216                104857600                      all
RowCache                          0                        0                      all

使用令牌范围的附加条件扩展 select count(*) from sometable with where on clustering column and partial pk -- allow filtering,如下所示:and partial pk token(full_pk) > :start and token(full_pk) <= :end - 在这种情况下,DSBulk 将针对发送到多个节点的特定令牌范围执行许多查询,并且不会像您的情况那样在单个节点上创建负载。

调查documentation for -query option, and for 4th blog in this series of blog posts about DSBulk, that could provide more information & examples: 1, 2, 3, 4, 5, 6

问题是你是 运行 DSBulk 中的 unload 命令来执行 SELECT COUNT() 这意味着它必须执行完整的 table扫描到 return 一行。

此外,不推荐使用ALLOW FILTERING,除非您将查询限制为单个分区。无论如何,即使在最佳情况下,ALLOW FILTERING 的性能也是非常不可预测的table。

我建议您改用 DSBulk count 命令,该命令针对 Cassandra 中的行数或分区数进行了优化。有关详细信息,请参阅 Counting data with DSBulk example.

DSBulk Counting blog post 中还有其他示例,Alex Ott 已在他的回答中链接了这些示例。干杯!