Cassandra 的最佳 JVM 设置

Question

我有一个 4 节点集群，每个盒子上有 16 个内核 CPU 和 100 GB RAM（每个机架上有 2 个节点）。

截至目前，所有都是运行 Cassandra (v2.1.4) 的默认 JVM 设置。使用此设置，每个节点使用 13GB RAM 和 30% CPU。它是一个写入繁重的集群，偶尔会进行删除或更新。

我是否需要调整 Cassandra 的 JVM 设置以利用更多内存？我应该注意哪些事项才能进行适当的设置？

Answer 1

Do I need to tune the JVM settings of Cassandra to utilize more memory?

DataStax Tuning Java Resources 文档实际上对此有一些非常合理的建议：

Many users new to Cassandra are tempted to turn up Java heap size too high, which consumes the majority of the underlying system's RAM. In most cases, increasing the Java heap size is actually detrimental for these reasons:

In most cases, the capability of Java to gracefully handle garbage collection above 8GB quickly diminishes.

Modern operating systems maintain the OS page cache for frequently accessed data and are very good at keeping this data in memory, but can be prevented from doing its job by an elevated Java heap size.

If you have more than 2GB of system memory, which is typical, keep the size of the Java heap relatively small to allow more memory for the page cache.

由于您的机器上有 100GB 的 RAM，（如果您确实运行ning 在 "default JVM settings" 下）您的 JVM 最大堆大小应该限制在 8192M。实际上，除非您遇到垃圾回收问题，否则我不会偏离这一点。

Cassandra 的 JVM 资源可以在 cassandra-env.sh 文件中设置。如果您好奇，请查看 cassandra-env.sh 的代码并查找 calculate_heap_sizes() 方法。这应该可以让您了解 Cassandra 如何计算您的默认 JVM 设置。

What all things should I be looking at to make appropriate settings?

如果您运行宁 OpsCenter（您应该是），请为 "Heap Used" 和 "Non Heap Used."

添加图表

这将使您能够轻松监控集群的 JVM 堆使用情况。另一件对我有帮助的事情是编写了一个 bash 脚本，我基本上从 cassandra-env.sh 劫持了 JVM 计算。这样我就可以在新机器上运行它，并立即知道我的 MAX_HEAP_SIZE 和 HEAP_NEWSIZE 会是什么：

#!/bin/bash
clear
echo "This is how Cassandra will determine its default Heap and GC Generation sizes."

system_memory_in_mb=`free -m | awk '/Mem:/ {print }'`
half_system_memory_in_mb=`expr $system_memory_in_mb / 2`
quarter_system_memory_in_mb=`expr $half_system_memory_in_mb / 2`

echo "   memory = $system_memory_in_mb"
echo "     half = $half_system_memory_in_mb"
echo "  quarter = $quarter_system_memory_in_mb"

echo "cpu cores = "`egrep -c 'processor([[:space:]]+):.*' /proc/cpuinfo`

#cassandra-env logic duped here
#this should help you to see how much memory is being allocated
#to the JVM
    if [ "$half_system_memory_in_mb" -gt "1024" ]
    then
        half_system_memory_in_mb="1024"
    fi
    if [ "$quarter_system_memory_in_mb" -gt "8192" ]
    then
        quarter_system_memory_in_mb="8192"
    fi
    if [ "$half_system_memory_in_mb" -gt "$quarter_system_memory_in_mb" ]
    then
        max_heap_size_in_mb="$half_system_memory_in_mb"
    else
        max_heap_size_in_mb="$quarter_system_memory_in_mb"
    fi
    MAX_HEAP_SIZE="${max_heap_size_in_mb}M"

    # Young gen: min(max_sensible_per_modern_cpu_core * num_cores, 1/4 * heap size)
    max_sensible_yg_per_core_in_mb="100"
    max_sensible_yg_in_mb=`expr ($max_sensible_yg_per_core_in_mb * $system_cpu_cores)`

    desired_yg_in_mb=`expr $max_heap_size_in_mb / 4`
    if [ "$desired_yg_in_mb" -gt "$max_sensible_yg_in_mb" ]
    then
        HEAP_NEWSIZE="${max_sensible_yg_in_mb}M"
    else
        HEAP_NEWSIZE="${desired_yg_in_mb}M"
    fi

echo "Max heap size = " $MAX_HEAP_SIZE
echo " New gen size = " $HEAP_NEWSIZE

更新20160212:

此外，请务必查看 Amy Tobey's 2.1 Cassandra Tuning Guide。她有一些很棒的技巧，可以帮助您以最佳方式运行获得集群。

Answer 2

system_cpu_cores 设置不正确。编辑了正确的执行。

#!/bin/bash
clear
echo "This is how Cassandra will determine its default Heap and GC Generation sizes."

system_memory_in_mb=`free -m | awk '/Mem:/ {print }'`
half_system_memory_in_mb=`expr $system_memory_in_mb / 2`
quarter_system_memory_in_mb=`expr $half_system_memory_in_mb / 2`
system_cpu_cores=`cat /proc/cpuinfo   | grep -i processor | wc -l`
echo "   memory = $system_memory_in_mb"
echo "     half = $half_system_memory_in_mb"
echo "  quarter = $quarter_system_memory_in_mb"

echo "cpu cores = `egrep -c 'processor([[:space:]]+):.*' /proc/cpuinfo`"

#cassandra-env logic duped here
#this should help you to see how much memory is being allocated
#to the JVM
if [ "$half_system_memory_in_mb" -gt "1024" ]
then
    half_system_memory_in_mb="1024"
fi
if [ "$quarter_system_memory_in_mb" -gt "8192" ]
then
    quarter_system_memory_in_mb="8192"
fi
if [ "$half_system_memory_in_mb" -gt "$quarter_system_memory_in_mb" ]
then
    max_heap_size_in_mb="$half_system_memory_in_mb"
else
    max_heap_size_in_mb="$quarter_system_memory_in_mb"
fi
MAX_HEAP_SIZE="${max_heap_size_in_mb}M"

# Young gen: min(max_sensible_per_modern_cpu_core * num_cores, 1/4 * heap size)
max_sensible_yg_per_core_in_mb="100"
max_sensible_yg_in_mb=`expr $max_sensible_yg_per_core_in_mb * $system_cpu_cores`
desired_yg_in_mb=`expr $max_heap_size_in_mb / 4`
if [ "$desired_yg_in_mb" -gt "$max_sensible_yg_in_mb" ]
then
    HEAP_NEWSIZE="${max_sensible_yg_in_mb}M"
else
    HEAP_NEWSIZE="${desired_yg_in_mb}M"
fi

echo "Max heap size = " $MAX_HEAP_SIZE
echo " New gen size = " $HEAP_NEWSIZE

Cassandra 的最佳 JVM 设置

Optimal JVM settings for Cassandra

jvm

cassandra

database-tuning

cassandra-2.1