如何在两台服务器之间可靠地复制 Cassandra 数据库？

Question

我有一个测试设置，我想要一份主数据。

我正在使用来自 datastax 的 Cassandra 包，版本 3.0.9

我正在使用 CQLSH 转储数据，并在测试设置中恢复。我正在使用

获取主数据的副本

COPY TO WITH DELIMITER = '\t' AND NULL = 'null' AND QUOTE = '"' AND HEADER = True

我正在使用

填充数据

COPY FROM WITH DELIMITER = '\t' AND NULL = 'null' AND QUOTE = '"' AND HEADER = True

在 COPY_FROM 之后，CQLSH 表示已成功复制文件中的所有行。但是当我运行在 table 上计数 (*) 时，有几行丢失了。缺少行没有特定的模式。如果我在 t运行cating table 之后重播命令，则会丢失一组新行。缺失行数是随机的。

table 结构包含 lists/sets 种用户定义的数据类型，UDT 内容中可能有 'null' 个值。

除了以编程方式在两个数据库之间读写单独的行之外，还有其他可靠的方法来复制数据吗？

table 的架构（字段名称已更改）：

CREATE TYPE UDT1 (
   field1 text,
   field2 int,
   field3 text
);

CREATE TYPE UDT2 (
   field1 boolean,
   field2 float
);

CREATE TABLE cypher.table1 (
   id int PRIMARY KEY,
   list1 list<frozen<UDT1>>,
   data text,
   set1 set<frozen<UDT2>>
) WITH bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
    AND comment = ''
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
    AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1.0
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99PERCENTILE';

Answer 1

除了 exporting/importing 数据之外，您还可以尝试复制数据本身。

使用 "nodetool snapshot" https://docs.datastax.com/en/cassandra/3.0/cassandra/tools/toolsSnapShot.html.
在测试集群上创建架构
将快照从原始集群加载到测试集群：

一个。如果测试中的所有节点都持有所有数据（单节点/3节点rf=3） - 或者数据量很小 - 将文件从原始集群复制到 keyspace/column_family 目录并执行 nodetool refresh ( https://docs.datastax.com/en/cassandra/3.0/cassandra/tools/toolsRefresh.html) - 确保文件不重叠

b。如果测试集群节点没有保存所有数据/数据量很大 - 使用 sstablloader (https://docs.datastax.com/en/cassandra/3.0/cassandra/tools/toolsBulkloader.html) 将快照中的文件流式传输到测试集群

Answer 2

我已经使用不带定界符的一般 COPY TO 和 COPY FROM 模式测试了您的架构，它工作正常。我测试了好几次都没有遗漏

cassandra@cqlsh:cypher> INSERT INTO table1 (id, data, list1, set1 ) VALUES ( 1, 'cypher', ['a',1,'b'], {true}) ;
cassandra@cqlsh:cypher> SELECT * FROM table1 ; 

 id | data   | list1                                                                                                                             | set1
----+--------+-----------------------------------------------------------------------------------------------------------------------------------+--------------------------------
  1 | cypher | [{field1: 'a', field2: null, field3: null}, {field1: '1', field2: null, field3: null}, {field1: 'b', field2: null, field3: null}] | {{field1: True, field2: null}}

cassandra@cqlsh:cypher> INSERT INTO table1 (id, data, list1, set1 ) VALUES ( 2, '2_cypher', ['amp','avd','ball'], {true, false}) ;
cassandra@cqlsh:cypher> SELECT * FROM table1 ;

 id | data     | list1                                                                                                                                    | set1
----+----------+------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------
  1 |   cypher |        [{field1: 'a', field2: null, field3: null}, {field1: '1', field2: null, field3: null}, {field1: 'b', field2: null, field3: null}] |                                {{field1: True, field2: null}}
  2 | 2_cypher | [{field1: 'amp', field2: null, field3: null}, {field1: 'avd', field2: null, field3: null}, {field1: 'ball', field2: null, field3: null}] | {{field1: False, field2: null}, {field1: True, field2: null}}

cassandra@cqlsh:cypher> COPY table1 TO 'table1.csv';
Using 1 child processes

Starting copy of cypher.table1 with columns [id, data, list1, set1].
Processed: 2 rows; Rate:       0 rows/s; Avg. rate:       0 rows/s
2 rows exported to 1 files in 4.358 seconds.
cassandra@cqlsh:cypher> TRUNCATE table table1 ;
cassandra@cqlsh:cypher> SELECT * FROM table1;

 id | data | list1 | set1
----+------+-------+------

cassandra@cqlsh:cypher> COPY table1 FROM 'table1.csv';
Using 1 child processes

Starting copy of cypher.table1 with columns [id, data, list1, set1].
Processed: 2 rows; Rate:       2 rows/s; Avg. rate:       3 rows/s
2 rows imported from 1 files in 0.705 seconds (0 skipped).
cassandra@cqlsh:cypher> SELECT * FROM table1  ;

 id | data     | list1                                                                                                                                    | set1
----+----------+------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------
  1 |   cypher |        [{field1: 'a', field2: null, field3: null}, {field1: '1', field2: null, field3: null}, {field1: 'b', field2: null, field3: null}] |                                {{field1: True, field2: null}}
  2 | 2_cypher | [{field1: 'amp', field2: null, field3: null}, {field1: 'avd', field2: null, field3: null}, {field1: 'ball', field2: null, field3: null}] | {{field1: False, field2: null}, {field1: True, field2: null}}

(2 rows)
cassandra@cqlsh:cypher>

如何在两台服务器之间可靠地复制 Cassandra 数据库？

How to reliably copy Cassandra database between two server?

cassandra

python-2.7

cqlsh