NoSpamLogger.java 已达到 Cassandra 的最大内存使用量

Question

我有一个 5 节点的 Cassandra 集群，每个节点上有大约 650 GB 的数据，涉及复制因子 3。我最近开始在 /var/log/cassandra/system.log.[=13 中看到以下错误=]

INFO [ReadStage-5] 2017-10-17 17:06:07,887 NoSpamLogger.java:91 - 已达到最大内存使用量 (1.000GiB)，无法分配 1.000MiB 的块

我曾尝试增加 file_cache_size_in_mb，但迟早会出现同样的错误。我曾尝试将此参数设置为 2GB，但无济于事。

当错误发生时，CPU 利用率飙升，读取延迟非常不稳定。我看到这种激增大约每 1/2 小时出现一次。请注意下面列表中的时间。

INFO [ReadStage-5] 2017-10-17 17:06:07,887 NoSpamLogger.java:91 - 已达到最大内存使用量 (1.000GiB)，无法分配 1.000MiB 的块 INFO [ReadStage-36] 2017-10-17 17:36:09,807 NoSpamLogger.java:91 - 达到最大内存使用量 (1.000GiB)，无法分配 1.000MiB 的块 INFO [ReadStage-15] 2017-10-17 18:05:56,003 NoSpamLogger.java:91 - 已达到最大内存使用量 (2.000GiB)，无法分配 1.000MiB 的块 INFO [ReadStage-28] 2017-10-17 18:36:01,177 NoSpamLogger.java:91 - 达到最大内存使用量 (2.000GiB)，无法分配 1.000MiB

的块

我有两个table是按小时分区的，而且分区很大。前任。这是他们来自 nodetool table stats

的输出

    Read Count: 4693453
    Read Latency: 0.36752741680805157 ms.
    Write Count: 561026
    Write Latency: 0.03742310516803143 ms.
    Pending Flushes: 0
        Table: raw_data
        SSTable count: 55
        Space used (live): 594395754275
        Space used (total): 594395754275
        Space used by snapshots (total): 0
        Off heap memory used (total): 360753372
        SSTable Compression Ratio: 0.20022598072758296
        Number of keys (estimate): 45163
        Memtable cell count: 90441
        Memtable data size: 685647925
        Memtable off heap memory used: 0
        Memtable switch count: 1
        Local read count: 0
        Local read latency: NaN ms
        Local write count: 126710
        Local write latency: 0.096 ms
        Pending flushes: 0
        Percent repaired: 52.99
        Bloom filter false positives: 167775
        Bloom filter false ratio: 0.16152
        Bloom filter space used: 264448
        Bloom filter off heap memory used: 264008
        Index summary off heap memory used: 31060
        Compression metadata off heap memory used: 360458304
        Compacted partition minimum bytes: 51
        **Compacted partition maximum bytes: 3449259151**
        Compacted partition mean bytes: 16642499
        Average live cells per slice (last five minutes): 1.0005435888450147
        Maximum live cells per slice (last five minutes): 42
        Average tombstones per slice (last five minutes): 1.0
        Maximum tombstones per slice (last five minutes): 1
        Dropped Mutations: 0



    Read Count: 4712814
    Read Latency: 0.3356051004771247 ms.
    Write Count: 643718
    Write Latency: 0.04168356951335834 ms.
    Pending Flushes: 0
        Table: customer_profile_history
        SSTable count: 20
        Space used (live): 9423364484
        Space used (total): 9423364484
        Space used by snapshots (total): 0
        Off heap memory used (total): 6560008
        SSTable Compression Ratio: 0.1744084338623116
        Number of keys (estimate): 69
        Memtable cell count: 35242
        Memtable data size: 789595302
        Memtable off heap memory used: 0
        Memtable switch count: 1
        Local read count: 2307
        Local read latency: NaN ms
        Local write count: 51772
        Local write latency: 0.076 ms
        Pending flushes: 0
        Percent repaired: 0.0
        Bloom filter false positives: 0
        Bloom filter false ratio: 0.00000
        Bloom filter space used: 384
        Bloom filter off heap memory used: 224
        Index summary off heap memory used: 400
        Compression metadata off heap memory used: 6559384
        Compacted partition minimum bytes: 20502
        **Compacted partition maximum bytes: 4139110981**
        Compacted partition mean bytes: 708736810
        Average live cells per slice (last five minutes): NaN
        Maximum live cells per slice (last five minutes): 0
        Average tombstones per slice (last five minutes): NaN
        Maximum tombstones per slice (last five minutes): 0
        Dropped Mutations: 0

这里是：

cdsdb/raw_data histograms
Percentile  SSTables     Write Latency      Read Latency    Partition Size        Cell Count
                              (micros)          (micros)           (bytes)                  
50%             0.00             61.21              0.00           1955666               642
75%             1.00             73.46              0.00          17436917              4768
95%             3.00            105.78              0.00         107964792             24601
98%             8.00            219.34              0.00         186563160             42510
99%            12.00            315.85              0.00         268650950             61214
Min             0.00              6.87              0.00                51                 0
Max            14.00           1358.10              0.00        3449259151           7007506

cdsdb/customer_profile_history histograms
Percentile  SSTables     Write Latency      Read Latency    Partition Size        Cell Count
                              (micros)          (micros)           (bytes)                  
50%             0.00             73.46              0.00         223875792             61214
75%             0.00             88.15              0.00         668489532            182785
95%             0.00            152.32              0.00        1996099046            654949
98%             0.00            785.94              0.00        3449259151           1358102
99%             0.00            943.13              0.00        3449259151           1358102
Min             0.00             24.60              0.00              5723                 4
Max             0.00           5839.59              0.00        5960319812           1955666

您能否提出一种缓解此问题的方法？

Answer 1

根据发布的 cfhistograms 输出，分区很大。

95% percentile of raw_data table has partition size of 107MB and max of 3.44GB. 95% percentile of customer_profile_history has partition size of 1.99GB and max of 5.96GB.

这显然与您每半小时注意到的问题有关，因为这些巨大的分区被写入 sstable。数据模型必须根据分区大小进行更改，最好将分区间隔设置为 "minute" 而不是 "hour"。所以一个 2GB 的分区会减少到 33MB 的分区。

建议的分区大小最大为接近 100MB。虽然理论上我们可以存储超过 100MB，但性能会受到影响。请记住，该分区的每次读取都是通过网络传输超过 100MB 的数据。在您的情况下，它超过 2GB，因此所有性能影响都随之而来。

NoSpamLogger.java 已达到 Cassandra 的最大内存使用量

NoSpamLogger.java Maximum memory usage reached Cassandra

cassandra

cassandra-3.0