Cassandra 节点上的数据大小不均匀

uneven data size on cassandra nodes

我很难理解为什么我的 Cassandra 节点的数据大小不均匀。

我有一个包含三个节点的集群。根据nodetool ring,每个节点拥有33.33%。仍然磁盘 space 使用率不均匀。

Node1: 4.7 GB (DC: logg_2, RAC: RAC1)
Node2: 13.9 GB (DC: logg_2, RAC:RAC2)
Node3: 9.3 GB (DC: logg_2, RAC:RAC1)

只有一个按键空间。

keyspace_definition: |
 CREATE KEYSPACE stresscql_cass_logg WITH replication = { 'class': 'NetworkTopologyStrategy', 'logg_2' : 2, 'logg_1' : 1};

并且只有一个 table 名为 blogposts

table_definition: |
  CREATE TABLE blogposts (
        domain text,
        published_date timeuuid,
        url text,
        author text,
        title text,
        body text,
        PRIMARY KEY(domain, published_date)
  ) WITH CLUSTERING ORDER BY (published_date DESC)
    AND compaction = { 'class':'LeveledCompactionStrategy' }
    AND comment='A table to hold blog posts'

请帮我理解为什么每个节点的数据大小不均匀。

所有权是节点拥有多少数据。

The percentage of the data owned by the node per datacenter times the replication factor. For example, a node can own 33% of the ring, but show 100% if the replication factor is 3.

Attention: If your cluster uses keyspaces having different replication strategies or replication factors, specify a keyspace when you run nodetool status to get meaningful ownship information.

可在此处找到更多信息: https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsStatus.html#toolsStatus__description

NetworkTopologyStrategy places replicas in the same datacenter by walking the ring clockwise until reaching the first node in another rack.

NetworkTopologyStrategy attempts to place replicas on distinct racks because nodes in the same rack (or similar physical grouping) often fail at the same time due to power, cooling, or network issues.

因为您只有两个机架(RAC1 和 RAC2),所以您将节点 1 和节点 3 的副本放在节点 2 中,这就是它更大的原因。

https://docs.datastax.com/en/cassandra/3.0/cassandra/architecture/archDataDistributeReplication.html