Cassandra 3.0 更新 SSTable 格式
Cassandra 3.0 updated SSTable format
根据this问题,Cassandra的存储格式在3.0更新了。
如果以前我可以使用 cassandra-cli 查看 SSTable 是如何构建的,得到这样的东西:
[default@test] list phonelists;
-------------------
RowKey: scott
=> (column=, value=, timestamp=1374684062860000)
=> (column=phonenumbers:bill, value='555-7382', timestamp=1374684062860000)
=> (column=phonenumbers:jane, value='555-8743', timestamp=1374684062860000)
=> (column=phonenumbers:patricia, value='555-4326', timestamp=1374684062860000)
-------------------
RowKey: john
=> (column=, value=, timestamp=1374683971220000)
=> (column=phonenumbers:doug, value='555-1579', timestamp=1374683971220000)
=> (column=phonenumbers:patricia, value='555-4326', timestamp=137468397122
在最新版本的 Cassandra 中,内部形式会是什么样子?你能举个例子吗?
我可以使用什么实用程序以上面列出的方式查看 Cassandra 中 table 的内部表示,但使用新的 SSTable 格式?
我在互联网上找到的所有内容是分区 header 如何存储列名,行存储聚类值并且没有重复值。
我该如何查看?
3.0 之前 sstable2json was a useful utility for getting an understanding of how data is organized in SSTables. This feature is not currently present in cassandra 3.0, but there will be an alternative eventually. Until then myself and Chris Lohfink have developed an alternative to sstable2json (sstable-tools) for Cassandra 3.0 which you can use to understand how data is organized. There is some talk about bringing this into cassandra proper in CASSANDRA-7464。
A key differentiator between the storage format between older verisons of Cassandra and Cassandra 3.0 is that an SSTable was previously a representation of partitions and their cells (identified by their clustering and column name) whereas with Cassandra 3.0 an SSTable now represents partitions and their rows.
您可以通过访问这些更改的主要开发人员的 blog post 来更详细地了解这些更改,他们对这些更改做了很好的详细解释。
您将看到的最大好处是,在一般情况下,您的数据大小会缩小(在某些情况下会缩小很多),因为 CQL 引入的大量开销已被一些关键增强功能消除。
这是一个显示 C* 2 和 3 之间差异的示例。
架构:
create keyspace demo with replication = {'class': 'SimpleStrategy', 'replication_factor': 1};
use demo;
create table phonelists (user text, person text, phonenumbers text, primary key (user, person));
insert into phonelists (user, person, phonenumbers) values ('scott', 'bill', '555-7382');
insert into phonelists (user, person, phonenumbers) values ('scott', 'jane', '555-8743');
insert into phonelists (user, person, phonenumbers) values ('scott', 'patricia', '555-4326');
insert into phonelists (user, person, phonenumbers) values ('john', 'doug', '555-1579');
insert into phonelists (user, person, phonenumbers) values ('john', 'patricia', '555-4326');
sstable2json C* 2.2 输出:
[
{"key": "scott",
"cells": [["bill:","",1451767903101827],
["bill:phonenumbers","555-7382",1451767903101827],
["jane:","",1451767911293116],
["jane:phonenumbers","555-8743",1451767911293116],
["patricia:","",1451767920541450],
["patricia:phonenumbers","555-4326",1451767920541450]]},
{"key": "john",
"cells": [["doug:","",1451767936220932],
["doug:phonenumbers","555-1579",1451767936220932],
["patricia:","",1451767945748889],
["patricia:phonenumbers","555-4326",1451767945748889]]}
]
sstable-tools toJson C* 3.0 输出:
[
{
"partition" : {
"key" : [ "scott" ]
},
"rows" : [
{
"type" : "row",
"clustering" : [ "bill" ],
"liveness_info" : { "tstamp" : 1451768259775428 },
"cells" : [
{ "name" : "phonenumbers", "value" : "555-7382" }
]
},
{
"type" : "row",
"clustering" : [ "jane" ],
"liveness_info" : { "tstamp" : 1451768259793653 },
"cells" : [
{ "name" : "phonenumbers", "value" : "555-8743" }
]
},
{
"type" : "row",
"clustering" : [ "patricia" ],
"liveness_info" : { "tstamp" : 1451768259796202 },
"cells" : [
{ "name" : "phonenumbers", "value" : "555-4326" }
]
}
]
},
{
"partition" : {
"key" : [ "john" ]
},
"rows" : [
{
"type" : "row",
"clustering" : [ "doug" ],
"liveness_info" : { "tstamp" : 1451768259798802 },
"cells" : [
{ "name" : "phonenumbers", "value" : "555-1579" }
]
},
{
"type" : "row",
"clustering" : [ "patricia" ],
"liveness_info" : { "tstamp" : 1451768259908016 },
"cells" : [
{ "name" : "phonenumbers", "value" : "555-4326" }
]
}
]
}
]
虽然输出更大(这更多是该工具的结果)。您可以看到的主要区别是:
- 数据现在是分区及其行(包括单元格)的集合,而不是分区及其单元格的集合。
- 时间戳现在处于行级别 (liveness_info) 而不是单元级别。如果某些行单元格的时间戳不同,新的存储引擎会进行增量编码以保存 space 并在单元格级别关联差异。这也包括 TTL。正如您可以想象的那样,如果您有很多 non-key 列,这会节省很多 space,因为不需要重复时间戳。
- 聚类信息(在本例中我们在 'person' 上聚类)现在出现在行级别而不是单元格级别,这节省了大量开销,因为聚类列值不必处于细胞水平。
我应该注意到,在这个特定的示例数据案例中,新存储引擎的优势并未完全实现,因为只有 1 non-clustering 列。
这里没有显示许多其他改进(例如存储 row-level 范围墓碑的能力)。
根据this问题,Cassandra的存储格式在3.0更新了。
如果以前我可以使用 cassandra-cli 查看 SSTable 是如何构建的,得到这样的东西:
[default@test] list phonelists;
-------------------
RowKey: scott
=> (column=, value=, timestamp=1374684062860000)
=> (column=phonenumbers:bill, value='555-7382', timestamp=1374684062860000)
=> (column=phonenumbers:jane, value='555-8743', timestamp=1374684062860000)
=> (column=phonenumbers:patricia, value='555-4326', timestamp=1374684062860000)
-------------------
RowKey: john
=> (column=, value=, timestamp=1374683971220000)
=> (column=phonenumbers:doug, value='555-1579', timestamp=1374683971220000)
=> (column=phonenumbers:patricia, value='555-4326', timestamp=137468397122
在最新版本的 Cassandra 中,内部形式会是什么样子?你能举个例子吗?
我可以使用什么实用程序以上面列出的方式查看 Cassandra 中 table 的内部表示,但使用新的 SSTable 格式?
我在互联网上找到的所有内容是分区 header 如何存储列名,行存储聚类值并且没有重复值。
我该如何查看?
3.0 之前 sstable2json was a useful utility for getting an understanding of how data is organized in SSTables. This feature is not currently present in cassandra 3.0, but there will be an alternative eventually. Until then myself and Chris Lohfink have developed an alternative to sstable2json (sstable-tools) for Cassandra 3.0 which you can use to understand how data is organized. There is some talk about bringing this into cassandra proper in CASSANDRA-7464。
A key differentiator between the storage format between older verisons of Cassandra and Cassandra 3.0 is that an SSTable was previously a representation of partitions and their cells (identified by their clustering and column name) whereas with Cassandra 3.0 an SSTable now represents partitions and their rows.
您可以通过访问这些更改的主要开发人员的 blog post 来更详细地了解这些更改,他们对这些更改做了很好的详细解释。
您将看到的最大好处是,在一般情况下,您的数据大小会缩小(在某些情况下会缩小很多),因为 CQL 引入的大量开销已被一些关键增强功能消除。
这是一个显示 C* 2 和 3 之间差异的示例。
架构:
create keyspace demo with replication = {'class': 'SimpleStrategy', 'replication_factor': 1};
use demo;
create table phonelists (user text, person text, phonenumbers text, primary key (user, person));
insert into phonelists (user, person, phonenumbers) values ('scott', 'bill', '555-7382');
insert into phonelists (user, person, phonenumbers) values ('scott', 'jane', '555-8743');
insert into phonelists (user, person, phonenumbers) values ('scott', 'patricia', '555-4326');
insert into phonelists (user, person, phonenumbers) values ('john', 'doug', '555-1579');
insert into phonelists (user, person, phonenumbers) values ('john', 'patricia', '555-4326');
sstable2json C* 2.2 输出:
[
{"key": "scott",
"cells": [["bill:","",1451767903101827],
["bill:phonenumbers","555-7382",1451767903101827],
["jane:","",1451767911293116],
["jane:phonenumbers","555-8743",1451767911293116],
["patricia:","",1451767920541450],
["patricia:phonenumbers","555-4326",1451767920541450]]},
{"key": "john",
"cells": [["doug:","",1451767936220932],
["doug:phonenumbers","555-1579",1451767936220932],
["patricia:","",1451767945748889],
["patricia:phonenumbers","555-4326",1451767945748889]]}
]
sstable-tools toJson C* 3.0 输出:
[
{
"partition" : {
"key" : [ "scott" ]
},
"rows" : [
{
"type" : "row",
"clustering" : [ "bill" ],
"liveness_info" : { "tstamp" : 1451768259775428 },
"cells" : [
{ "name" : "phonenumbers", "value" : "555-7382" }
]
},
{
"type" : "row",
"clustering" : [ "jane" ],
"liveness_info" : { "tstamp" : 1451768259793653 },
"cells" : [
{ "name" : "phonenumbers", "value" : "555-8743" }
]
},
{
"type" : "row",
"clustering" : [ "patricia" ],
"liveness_info" : { "tstamp" : 1451768259796202 },
"cells" : [
{ "name" : "phonenumbers", "value" : "555-4326" }
]
}
]
},
{
"partition" : {
"key" : [ "john" ]
},
"rows" : [
{
"type" : "row",
"clustering" : [ "doug" ],
"liveness_info" : { "tstamp" : 1451768259798802 },
"cells" : [
{ "name" : "phonenumbers", "value" : "555-1579" }
]
},
{
"type" : "row",
"clustering" : [ "patricia" ],
"liveness_info" : { "tstamp" : 1451768259908016 },
"cells" : [
{ "name" : "phonenumbers", "value" : "555-4326" }
]
}
]
}
]
虽然输出更大(这更多是该工具的结果)。您可以看到的主要区别是:
- 数据现在是分区及其行(包括单元格)的集合,而不是分区及其单元格的集合。
- 时间戳现在处于行级别 (liveness_info) 而不是单元级别。如果某些行单元格的时间戳不同,新的存储引擎会进行增量编码以保存 space 并在单元格级别关联差异。这也包括 TTL。正如您可以想象的那样,如果您有很多 non-key 列,这会节省很多 space,因为不需要重复时间戳。
- 聚类信息(在本例中我们在 'person' 上聚类)现在出现在行级别而不是单元格级别,这节省了大量开销,因为聚类列值不必处于细胞水平。
我应该注意到,在这个特定的示例数据案例中,新存储引擎的优势并未完全实现,因为只有 1 non-clustering 列。
这里没有显示许多其他改进(例如存储 row-level 范围墓碑的能力)。