无法清除 Cassandra 墓碑

Question

我们在生产环境中有一个 Cassandra 集群无法像这样停止记录 WARN 和 ERRORS：

WARN [ReadStage:290753] 2016-04-22 17:00:06,461 SliceQueryFilter.java (line 231) Read 101 live and 33528 tombstone cells in keyspace.tablespace.Events_event_type_idx (see tombstone_warn_threshold). 100 columns was requested, slices=[5347432d45504a2d3535373639333936:2016/04/22 16\:46\:24.186-COMMANDE-ORDER-201655769396001-]

ERROR [ReadStage:290744] 2016-04-22 17:00:07,556 SliceQueryFilter.java (line 206) Scanned over 100000 tombstones in crm.Events.Events_event_type_idx; query aborted (see tombstone_failure_threshold)

ERROR [ReadStage:290729] 2016-04-22 17:00:18,708 SliceQueryFilter.java (line 206) Scanned over 100000 tombstones in crm.Events.Events_event_type_idx; query aborted (see tombstone_failure_threshold)

ERROR [ReadStage:290729] 2016-04-22 17:00:18,709 CassandraDaemon.java (line 258) Exception in thread Thread[ReadStage:290729,5,main]
java.lang.RuntimeException: org.apache.cassandra.db.filter.TombstoneOverwhelmingException
at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2016)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: org.apache.cassandra.db.filter.TombstoneOverwhelmingException
at    org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:208)
at org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:122)
at org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:80)
.
.
.

ERROR [ReadStage:290751] 2016-04-22 17:00:30,771 SliceQueryFilter.java (line 206) Scanned over 100000 tombstones in crm.Events.Events_event_type_idx; query aborted (see tombstone_failure_threshold)

设置为：Cassandra 2.0.15、4 个节点、复制 3。此表空间内的数据没有 TTL，de gc_grace 设置为 0.

我们实际上每周做 'maintenance'，其中包括：

#!/bin/bash

logfile="/var/log/cassandra/maintenance.log"
echo "----------------------------------------" >> $logfile
echo "$(date) Cassandra cluster maintenance started." >> $logfile
echo "----------------------------------------" >> $logfile

nodetool -h localhost setcompactionthroughput 999
echo "$(date)  Cassandra scrub started." >> $logfile
nodetool -h localhost scrub
echo "$(date)  Cassandra scrub completed." >> $logfile
echo "$(date)  Cassandra repair started." >> $logfile
nodetool -h localhost repair --partitioner-range
echo "$(date)  Cassandra repair completed." >> $logfile
echo "$(date)  Cassandra compaction started." >> $logfile
nodetool -h localhost compact
echo "$(date)  Cassandra compaction completed." >> $logfile
echo "$(date)  Cassandra cleanup started." >> $logfile
nodetool -h localhost cleanup
echo "$(date)  Cassandra cleanup completed." >> $logfile

nodetool -h localhost setcompactionthroughput 16

dt=$SECONDS
ds=$((dt % 60))
dm=$(((dt / 60) % 60))
dh=$((dt / 3600))
printf 'Total Run Time : %d:%02d:%02d' $dh $dm $ds >> $logfile

这 'maintenance' 没有解决问题，我们尝试对特定的表空间执行特定操作，但效果不佳。

我们尝试将 gc_grace 设置为更高的值，然后启动维护脚本，但结果相同。我知道这并不是一个真正的错误，而是一种在 Cassandra 上保持良好性能的保护措施，但我们对此感到模糊。

我们的下一步是转储整个表空间，删除它然后重新创建，但这对于生产中的集群来说似乎有点激进。

有谁知道墓碑清理会出现什么问题？

谢谢，

此致

Answer 1

首先，您的维护脚本有点奇怪。您通常不希望定期运行一个完整的 nodetool compact。 Cassandra 的压缩策略足够聪明，可以自动做正确的事情。

就是说，您的墓碑异常在 crm.Events.Events_event_type_idx 中，它看起来像是 Events(event_type) 的二级索引。当您 insert/change/delete 事件数据时，该索引会建立大量的墓碑。当索引中的数据分布与 table 中的数据分布不同时，这是二级索引中不太常见（但并非意外）的边缘情况 - cassandra 中的二级索引在中等基数下运行良好，但你'我们有很多特定的事件类型。

解决这个问题的第一步是 nodetool rebuild_index ks/table，并希望它清除一些墓碑 - 我怀疑它会。下一步是重新建模您的数据，这样您以后就不会再遇到这个问题了。

Answer 2

已经很久了但是.. 在不同集群上遇到相同问题后，此类问题可能有多个来源。

可能是因为： - 您正在以比实际 gc_grace 和修复组合更高的速度删除/修改行。 - 无法考虑压实，您的墓碑永远不会被清除。

如果您的数据模型结构良好，通常不会出现此错误，大多数时候人们使用 Cassandra 的方式与他们使用典型关系数据库的方式相同。

正如@JeffJirsa 之前所说，不需要手动压缩，因为它会停止由 Cassandra 自动进行的压缩。要使它们再次发生，您必须重新启动节点。

这里的解决方案是降低 gc_grace 并更频繁地进行修复（应该比 gc_grace 值更频繁地发生以避免弹性问题）不幸的是，这不是完全重塑数据的最佳解决方案。

无法清除 Cassandra 墓碑

Cassandra tombstones can't be cleared

cassandra-2.0