墓碑与节点工具和修复

Question

我在Cassandra 的table 中插入了10K 个条目，它在单个分区下的TTL 为1 分钟。

成功插入后，我尝试从单个分区读取所有数据，但它抛出如下错误，

WARN  [ReadStage-2] 2018-04-04 11:39:44,833 ReadCommand.java:533 - Read 0 live rows and 100001 tombstone cells for query SELECT * FROM qcs.job LIMIT 100 (see tombstone_warn_threshold)
DEBUG [Native-Transport-Requests-1] 2018-04-04 11:39:44,834 ReadCallback.java:132 - Failed; received 0 of 1 responses
ERROR [ReadStage-2] 2018-04-04 11:39:44,836 StorageProxy.java:1906 - Scanned over 100001 tombstones during query 'SELECT * FROM qcs.job LIMIT 100' (last scanned row partion key was ((job), 2018-04-04 11:19+0530, 1, jobType1522820944168, jobId1522820944168)); query aborted

我理解逻辑删除是 ss 中的一个标记table 而不是实际的删除。

所以我使用 nodetool

执行了 compaction 和 repair

即使在那之后，当我从 table 读取数据时，它也会在日志文件中抛出同样的错误。

1) 如何处理这种情况？

2) 能否解释一下为什么会发生这种情况，为什么压缩和修复没有解决这个问题？

Answer 1

墓碑在 table 的 gc_grace_seconds 设置指定的期限后真正删除（默认为 10 天）。这样做是为了确保在删除时关闭的任何节点将在恢复后获取这些更改。以下是对此进行详细讨论的博客文章：from thelastpickle (recommended), 1, 2, and DSE documentation or Cassandra documentation。

您可以将单个 table 上的 gc_grace_seconds 选项设置为较低的值以更快地删除已删除的数据，但这应该只对 table 具有 TTL 数据的人进行。您可能还需要调整 tombstone_threshold & tombstone_compaction_interval table 选项以更快地执行压缩。有关这些选项的说明，请参阅 this document or this document。

Answer 2

新的 cassandra 支持。

$ ./nodetool garbagecollect

执行此命令后"Transfer memory to disk, before restart"

$ ./nodetool drain    # "This closes connection after that, clients can not access. "

关闭cassandra并重新启动。 "You should restart after drain. "

** 不用排，！但是，视情况而定。！这些是额外的信息。

墓碑与节点工具和修复

Tombstone vs nodetool and repair

cassandra

cassandra-3.0