如何在两台服务器之间可靠地复制 Cassandra 数据库?
How to reliably copy Cassandra database between two server?
我有一个测试设置,我想要一份主数据。
我正在使用来自 datastax 的 Cassandra 包,版本 3.0.9
我正在使用 CQLSH 转储数据,并在测试设置中恢复。
我正在使用
获取主数据的副本
COPY TO WITH DELIMITER = '\t' AND NULL = 'null' AND QUOTE = '"' AND HEADER = True
我正在使用
填充数据
COPY FROM WITH DELIMITER = '\t' AND NULL = 'null' AND QUOTE = '"' AND HEADER = True
在 COPY_FROM 之后,CQLSH 表示已成功复制文件中的所有行。但是当我 运行 在 table 上计数 (*) 时,有几行丢失了。
缺少行没有特定的模式。如果我在 t运行cating table 之后重播命令,则会丢失一组新行。缺失行数是随机的。
table 结构包含 lists/sets 种用户定义的数据类型,UDT 内容中可能有 'null' 个值。
除了以编程方式在两个数据库之间读写单独的行之外,还有其他可靠的方法来复制数据吗?
table 的架构(字段名称已更改):
CREATE TYPE UDT1 (
field1 text,
field2 int,
field3 text
);
CREATE TYPE UDT2 (
field1 boolean,
field2 float
);
CREATE TABLE cypher.table1 (
id int PRIMARY KEY,
list1 list<frozen<UDT1>>,
data text,
set1 set<frozen<UDT2>>
) WITH bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';
除了 exporting/importing 数据之外,您还可以尝试复制数据本身。
- 使用 "nodetool snapshot" https://docs.datastax.com/en/cassandra/3.0/cassandra/tools/toolsSnapShot.html.
从原始集群获取数据快照
- 在测试集群上创建架构
将快照从原始集群加载到测试集群:
一个。如果测试中的所有节点都持有所有数据(单节点/3节点rf=3) - 或者数据量很小 - 将文件从原始集群复制到 keyspace/column_family 目录并执行 nodetool refresh ( https://docs.datastax.com/en/cassandra/3.0/cassandra/tools/toolsRefresh.html) - 确保文件不重叠
b。如果测试集群节点没有保存所有数据/数据量很大 - 使用 sstablloader (https://docs.datastax.com/en/cassandra/3.0/cassandra/tools/toolsBulkloader.html) 将快照中的文件流式传输到测试集群
我已经使用不带定界符的一般 COPY TO
和 COPY FROM
模式测试了您的架构,它工作正常。我测试了好几次都没有遗漏
cassandra@cqlsh:cypher> INSERT INTO table1 (id, data, list1, set1 ) VALUES ( 1, 'cypher', ['a',1,'b'], {true}) ;
cassandra@cqlsh:cypher> SELECT * FROM table1 ;
id | data | list1 | set1
----+--------+-----------------------------------------------------------------------------------------------------------------------------------+--------------------------------
1 | cypher | [{field1: 'a', field2: null, field3: null}, {field1: '1', field2: null, field3: null}, {field1: 'b', field2: null, field3: null}] | {{field1: True, field2: null}}
cassandra@cqlsh:cypher> INSERT INTO table1 (id, data, list1, set1 ) VALUES ( 2, '2_cypher', ['amp','avd','ball'], {true, false}) ;
cassandra@cqlsh:cypher> SELECT * FROM table1 ;
id | data | list1 | set1
----+----------+------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------
1 | cypher | [{field1: 'a', field2: null, field3: null}, {field1: '1', field2: null, field3: null}, {field1: 'b', field2: null, field3: null}] | {{field1: True, field2: null}}
2 | 2_cypher | [{field1: 'amp', field2: null, field3: null}, {field1: 'avd', field2: null, field3: null}, {field1: 'ball', field2: null, field3: null}] | {{field1: False, field2: null}, {field1: True, field2: null}}
cassandra@cqlsh:cypher> COPY table1 TO 'table1.csv';
Using 1 child processes
Starting copy of cypher.table1 with columns [id, data, list1, set1].
Processed: 2 rows; Rate: 0 rows/s; Avg. rate: 0 rows/s
2 rows exported to 1 files in 4.358 seconds.
cassandra@cqlsh:cypher> TRUNCATE table table1 ;
cassandra@cqlsh:cypher> SELECT * FROM table1;
id | data | list1 | set1
----+------+-------+------
cassandra@cqlsh:cypher> COPY table1 FROM 'table1.csv';
Using 1 child processes
Starting copy of cypher.table1 with columns [id, data, list1, set1].
Processed: 2 rows; Rate: 2 rows/s; Avg. rate: 3 rows/s
2 rows imported from 1 files in 0.705 seconds (0 skipped).
cassandra@cqlsh:cypher> SELECT * FROM table1 ;
id | data | list1 | set1
----+----------+------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------
1 | cypher | [{field1: 'a', field2: null, field3: null}, {field1: '1', field2: null, field3: null}, {field1: 'b', field2: null, field3: null}] | {{field1: True, field2: null}}
2 | 2_cypher | [{field1: 'amp', field2: null, field3: null}, {field1: 'avd', field2: null, field3: null}, {field1: 'ball', field2: null, field3: null}] | {{field1: False, field2: null}, {field1: True, field2: null}}
(2 rows)
cassandra@cqlsh:cypher>
我有一个测试设置,我想要一份主数据。
我正在使用来自 datastax 的 Cassandra 包,版本 3.0.9
我正在使用 CQLSH 转储数据,并在测试设置中恢复。 我正在使用
获取主数据的副本COPY TO WITH DELIMITER = '\t' AND NULL = 'null' AND QUOTE = '"' AND HEADER = True
我正在使用
填充数据COPY FROM WITH DELIMITER = '\t' AND NULL = 'null' AND QUOTE = '"' AND HEADER = True
在 COPY_FROM 之后,CQLSH 表示已成功复制文件中的所有行。但是当我 运行 在 table 上计数 (*) 时,有几行丢失了。 缺少行没有特定的模式。如果我在 t运行cating table 之后重播命令,则会丢失一组新行。缺失行数是随机的。
table 结构包含 lists/sets 种用户定义的数据类型,UDT 内容中可能有 'null' 个值。
除了以编程方式在两个数据库之间读写单独的行之外,还有其他可靠的方法来复制数据吗?
table 的架构(字段名称已更改):
CREATE TYPE UDT1 (
field1 text,
field2 int,
field3 text
);
CREATE TYPE UDT2 (
field1 boolean,
field2 float
);
CREATE TABLE cypher.table1 (
id int PRIMARY KEY,
list1 list<frozen<UDT1>>,
data text,
set1 set<frozen<UDT2>>
) WITH bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';
除了 exporting/importing 数据之外,您还可以尝试复制数据本身。
- 使用 "nodetool snapshot" https://docs.datastax.com/en/cassandra/3.0/cassandra/tools/toolsSnapShot.html. 从原始集群获取数据快照
- 在测试集群上创建架构
将快照从原始集群加载到测试集群:
一个。如果测试中的所有节点都持有所有数据(单节点/3节点rf=3) - 或者数据量很小 - 将文件从原始集群复制到 keyspace/column_family 目录并执行 nodetool refresh ( https://docs.datastax.com/en/cassandra/3.0/cassandra/tools/toolsRefresh.html) - 确保文件不重叠
b。如果测试集群节点没有保存所有数据/数据量很大 - 使用 sstablloader (https://docs.datastax.com/en/cassandra/3.0/cassandra/tools/toolsBulkloader.html) 将快照中的文件流式传输到测试集群
我已经使用不带定界符的一般 COPY TO
和 COPY FROM
模式测试了您的架构,它工作正常。我测试了好几次都没有遗漏
cassandra@cqlsh:cypher> INSERT INTO table1 (id, data, list1, set1 ) VALUES ( 1, 'cypher', ['a',1,'b'], {true}) ;
cassandra@cqlsh:cypher> SELECT * FROM table1 ;
id | data | list1 | set1
----+--------+-----------------------------------------------------------------------------------------------------------------------------------+--------------------------------
1 | cypher | [{field1: 'a', field2: null, field3: null}, {field1: '1', field2: null, field3: null}, {field1: 'b', field2: null, field3: null}] | {{field1: True, field2: null}}
cassandra@cqlsh:cypher> INSERT INTO table1 (id, data, list1, set1 ) VALUES ( 2, '2_cypher', ['amp','avd','ball'], {true, false}) ;
cassandra@cqlsh:cypher> SELECT * FROM table1 ;
id | data | list1 | set1
----+----------+------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------
1 | cypher | [{field1: 'a', field2: null, field3: null}, {field1: '1', field2: null, field3: null}, {field1: 'b', field2: null, field3: null}] | {{field1: True, field2: null}}
2 | 2_cypher | [{field1: 'amp', field2: null, field3: null}, {field1: 'avd', field2: null, field3: null}, {field1: 'ball', field2: null, field3: null}] | {{field1: False, field2: null}, {field1: True, field2: null}}
cassandra@cqlsh:cypher> COPY table1 TO 'table1.csv';
Using 1 child processes
Starting copy of cypher.table1 with columns [id, data, list1, set1].
Processed: 2 rows; Rate: 0 rows/s; Avg. rate: 0 rows/s
2 rows exported to 1 files in 4.358 seconds.
cassandra@cqlsh:cypher> TRUNCATE table table1 ;
cassandra@cqlsh:cypher> SELECT * FROM table1;
id | data | list1 | set1
----+------+-------+------
cassandra@cqlsh:cypher> COPY table1 FROM 'table1.csv';
Using 1 child processes
Starting copy of cypher.table1 with columns [id, data, list1, set1].
Processed: 2 rows; Rate: 2 rows/s; Avg. rate: 3 rows/s
2 rows imported from 1 files in 0.705 seconds (0 skipped).
cassandra@cqlsh:cypher> SELECT * FROM table1 ;
id | data | list1 | set1
----+----------+------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------
1 | cypher | [{field1: 'a', field2: null, field3: null}, {field1: '1', field2: null, field3: null}, {field1: 'b', field2: null, field3: null}] | {{field1: True, field2: null}}
2 | 2_cypher | [{field1: 'amp', field2: null, field3: null}, {field1: 'avd', field2: null, field3: null}, {field1: 'ball', field2: null, field3: null}] | {{field1: False, field2: null}, {field1: True, field2: null}}
(2 rows)
cassandra@cqlsh:cypher>