无法将节点添加到 cockroachde 集群

Question

我正在质押将 CockroachDB 节点加入集群。我创建了第一个集群，然后尝试将第二个节点加入到第一个节点，但是第二个节点创建了新的集群，如下所示。有谁知道我下面的步骤有什么不对的，欢迎大家提出建议。

我已经启动了第一个节点如下：

cockroach start --insecure --advertise-host=163.172.156.111
* Check out how to secure your cluster: https://www.cockroachlabs.com/docs/v19.1/secure-a-cluster.html
*
CockroachDB node starting at 2019-05-11 01:11:15.45522036 +0000 UTC (took 2.5s)
build:               CCL v19.1.0 @ 2019/04/29 18:36:40 (go1.11.6)
webui:               http://163.172.156.111:8080
sql:                 postgresql://root@163.172.156.111:26257?sslmode=disable
client flags:        cockroach <client cmd> --host=163.172.156.111:26257 --insecure
logs:                /home/ueda/cockroach-data/logs
temp dir:            /home/ueda/cockroach-data/cockroach-temp449555924
external I/O path:   /home/ueda/cockroach-data/extern
store[0]:            path=/home/ueda/cockroach-data
status:              initialized new cluster
clusterID:           3e797faa-59a1-4b0d-83b5-36143ddbdd69
nodeID:              1

然后，启动辅助节点加入到163.172.156.111，但是无法加入：

cockroach start --insecure --advertise-addr=128.199.127.164 --join=163.172.156.111:26257
CockroachDB node starting at 2019-05-11 01:21:14.533097432 +0000 UTC (took 0.8s)
build:               CCL v19.1.0 @ 2019/04/29 18:36:40 (go1.11.6)
webui:               http://128.199.127.164:8080
sql:                 postgresql://root@128.199.127.164:26257?sslmode=disable
client flags:        cockroach <client cmd> --host=128.199.127.164:26257 --insecure
logs:                /home/ueda/cockroach-data/logs
temp dir:            /home/ueda/cockroach-data/cockroach-temp067740997
external I/O path:   /home/ueda/cockroach-data/extern
store[0]:            path=/home/ueda/cockroach-data
status:              restarted pre-existing node
clusterID:           a14e89a7-792d-44d3-89af-7037442eacbc
nodeID:              1

加入节点的cockroach.log出现一些八卦错误：

cat cockroach-data/logs/cockroach.log 
I190511 01:21:13.762309 1 util/log/clog.go:1199  [config] file created at: 2019/05/11 01:21:13
I190511 01:21:13.762309 1 util/log/clog.go:1199  [config] running on machine: amfortas
I190511 01:21:13.762309 1 util/log/clog.go:1199  [config] binary: CockroachDB CCL v19.1.0 (x86_64-unknown-linux-gnu, built 2019/04/29 18:36:40, go1.11.6)
I190511 01:21:13.762309 1 util/log/clog.go:1199  [config] arguments: [cockroach start --insecure --advertise-addr=128.199.127.164 --join=163.172.156.111:26257]
I190511 01:21:13.762309 1 util/log/clog.go:1199  line format: [IWEF]yymmdd hh:mm:ss.uuuuuu goid file:line msg utf8=✓
I190511 01:21:13.762307 1 cli/start.go:1033  logging to directory /home/ueda/cockroach-data/logs
W190511 01:21:13.763373 1 cli/start.go:1068  RUNNING IN INSECURE MODE!

- Your cluster is open for any client that can access <all your IP addresses>.
- Any user, even root, can log in without providing a password.
- Any user, connecting as root, can read or write any data in your cluster.
- There is no network encryption nor authentication, and thus no confidentiality.

Check out how to secure your cluster: https://www.cockroachlabs.com/docs/v19.1/secure-a-cluster.html
I190511 01:21:13.763675 1 server/status/recorder.go:610  available memory from cgroups (8.0 EiB) exceeds system memory 992 MiB, using system memory
W190511 01:21:13.763752 1 cli/start.go:944  Using the default setting for --cache (128 MiB).
  A significantly larger value is usually needed for good performance.
  If you have a dedicated server a reasonable setting is --cache=.25 (248 MiB).
I190511 01:21:13.764011 1 server/status/recorder.go:610  available memory from cgroups (8.0 EiB) exceeds system memory 992 MiB, using system memory
W190511 01:21:13.764047 1 cli/start.go:957  Using the default setting for --max-sql-memory (128 MiB).
  A significantly larger value is usually needed in production.
  If you have a dedicated server a reasonable setting is --max-sql-memory=.25 (248 MiB).
I190511 01:21:13.764239 1 server/status/recorder.go:610  available memory from cgroups (8.0 EiB) exceeds system memory 992 MiB, using system memory
I190511 01:21:13.764272 1 cli/start.go:1082  CockroachDB CCL v19.1.0 (x86_64-unknown-linux-gnu, built 2019/04/29 18:36:40, go1.11.6)
I190511 01:21:13.866977 1 server/status/recorder.go:610  available memory from cgroups (8.0 EiB) exceeds system memory 992 MiB, using system memory
I190511 01:21:13.867002 1 server/config.go:386  system total memory: 992 MiB
I190511 01:21:13.867063 1 server/config.go:388  server configuration:
max offset             500000000
cache size             128 MiB
SQL memory pool size   128 MiB
scan interval          10m0s
scan min idle time     10ms
scan max idle time     1s
event log enabled      true
I190511 01:21:13.867098 1 cli/start.go:929  process identity: uid 1000 euid 1000 gid 1000 egid 1000
I190511 01:21:13.867115 1 cli/start.go:554  starting cockroach node
I190511 01:21:13.868242 21 storage/engine/rocksdb.go:613  opening rocksdb instance at "/home/ueda/cockroach-data/cockroach-temp067740997"
I190511 01:21:13.894320 21 server/server.go:876  [n?] monitoring forward clock jumps based on server.clock.forward_jump_check_enabled
I190511 01:21:13.894813 21 storage/engine/rocksdb.go:613  opening rocksdb instance at "/home/ueda/cockroach-data"
W190511 01:21:13.896301 21 storage/engine/rocksdb.go:127  [rocksdb] [/go/src/github.com/cockroachdb/cockroach/c-deps/rocksdb/db/version_set.cc:2566] More existing levels in DB than needed. max_bytes_for_level_multiplier may not be guaranteed.
W190511 01:21:13.905666 21 storage/engine/rocksdb.go:127  [rocksdb] [/go/src/github.com/cockroachdb/cockroach/c-deps/rocksdb/db/version_set.cc:2566] More existing levels in DB than needed. max_bytes_for_level_multiplier may not be guaranteed.
I190511 01:21:13.911380 21 server/config.go:494  [n?] 1 storage engine initialized
I190511 01:21:13.911417 21 server/config.go:497  [n?] RocksDB cache size: 128 MiB
I190511 01:21:13.911427 21 server/config.go:497  [n?] store 0: RocksDB, max size 0 B, max open file limit 10000
W190511 01:21:13.912459 21 gossip/gossip.go:1496  [n?] no incoming or outgoing connections
I190511 01:21:13.913206 21 server/server.go:926  [n?] Sleeping till wall time 1557537673913178595 to catches up to 1557537674394265598 to ensure monotonicity. Delta: 481.087003ms
I190511 01:21:14.251655 65 vendor/github.com/cockroachdb/circuitbreaker/circuitbreaker.go:322  [n?] circuitbreaker: gossip [::]:26257->163.172.156.111:26257 tripped: initial connection heartbeat failed: rpc error: code = Unknown desc = client cluster ID "a14e89a7-792d-44d3-89af-7037442eacbc" doesn't match server cluster ID "3e797faa-59a1-4b0d-83b5-36143ddbdd69"
I190511 01:21:14.251695 65 vendor/github.com/cockroachdb/circuitbreaker/circuitbreaker.go:447  [n?] circuitbreaker: gossip [::]:26257->163.172.156.111:26257 event: BreakerTripped
W190511 01:21:14.251763 65 gossip/client.go:122  [n?] failed to start gossip client to 163.172.156.111:26257: initial connection heartbeat failed: rpc error: code = Unknown desc = client cluster ID "a14e89a7-792d-44d3-89af-7037442eacbc" doesn't match server cluster ID "3e797faa-59a1-4b0d-83b5-36143ddbdd69"
I190511 01:21:14.395848 21 gossip/gossip.go:392  [n1] NodeDescriptor set to node_id:1 address:<network_field:"tcp" address_field:"128.199.127.164:26257" > attrs:<> locality:<> ServerVersion:<major_val:19 minor_val:1 patch:0 unstable:0 > build_tag:"v19.1.0" started_at:1557537674395557548 
W190511 01:21:14.458176 21 storage/replica_range_lease.go:506  can't determine lease status due to node liveness error: node not in the liveness table
I190511 01:21:14.458465 21 server/node.go:461  [n1] initialized store [n1,s1]: disk (capacity=24 GiB, available=18 GiB, used=2.2 MiB, logicalBytes=41 MiB), ranges=20, leases=0, queries=0.00, writes=0.00, bytesPerReplica={p10=0.00 p25=0.00 p50=0.00 p75=6467.00 p90=26940.00 pMax=43017435.00}, writesPerReplica={p10=0.00 p25=0.00 p50=0.00 p75=0.00 p90=0.00 pMax=0.00}
I190511 01:21:14.458775 21 storage/stores.go:244  [n1] read 0 node addresses from persistent storage
I190511 01:21:14.459095 21 server/node.go:699  [n1] connecting to gossip network to verify cluster ID...
W190511 01:21:14.469842 96 storage/store.go:1525  [n1,s1,r6/1:/Table/{SystemCon…-11}] could not gossip system config: [NotLeaseHolderError] r6: replica (n1,s1):1 not lease holder; lease holder unknown
I190511 01:21:14.474785 21 server/node.go:719  [n1] node connected via gossip and verified as part of cluster "a14e89a7-792d-44d3-89af-7037442eacbc"
I190511 01:21:14.475033 21 server/node.go:542  [n1] node=1: started with [<no-attributes>=/home/ueda/cockroach-data] engine(s) and attributes []
I190511 01:21:14.475393 21 server/status/recorder.go:610  [n1] available memory from cgroups (8.0 EiB) exceeds system memory 992 MiB, using system memory
I190511 01:21:14.475514 21 server/server.go:1582  [n1] starting http server at [::]:8080 (use: 128.199.127.164:8080)
I190511 01:21:14.475572 21 server/server.go:1584  [n1] starting grpc/postgres server at [::]:26257
I190511 01:21:14.475605 21 server/server.go:1585  [n1] advertising CockroachDB node at 128.199.127.164:26257
W190511 01:21:14.475655 21 jobs/registry.go:341  [n1] unable to get node liveness: node not in the liveness table
I190511 01:21:14.532949 21 server/server.go:1650  [n1] done ensuring all necessary migrations have run
I190511 01:21:14.533020 21 server/server.go:1653  [n1] serving sql connections
I190511 01:21:14.533209 21 cli/start.go:689  [config] clusterID: a14e89a7-792d-44d3-89af-7037442eacbc
I190511 01:21:14.533257 21 cli/start.go:697  node startup completed:
CockroachDB node starting at 2019-05-11 01:21:14.533097432 +0000 UTC (took 0.8s)
build:               CCL v19.1.0 @ 2019/04/29 18:36:40 (go1.11.6)
webui:               http://128.199.127.164:8080
sql:                 postgresql://root@128.199.127.164:26257?sslmode=disable
client flags:        cockroach <client cmd> --host=128.199.127.164:26257 --insecure
logs:                /home/ueda/cockroach-data/logs
temp dir:            /home/ueda/cockroach-data/cockroach-temp067740997
external I/O path:   /home/ueda/cockroach-data/extern
store[0]:            path=/home/ueda/cockroach-data
status:              restarted pre-existing node
clusterID:           a14e89a7-792d-44d3-89af-7037442eacbc
nodeID:              1
I190511 01:21:14.541205 146 server/server_update.go:67  [n1] no need to upgrade, cluster already at the newest version
I190511 01:21:14.555557 149 sql/event_log.go:135  [n1] Event: "node_restart", target: 1, info: {Descriptor:{NodeID:1 Address:128.199.127.164:26257 Attrs: Locality: ServerVersion:19.1 BuildTag:v19.1.0 StartedAt:1557537674395557548 LocalityAddress:[] XXX_NoUnkeyedLiteral:{} XXX_sizecache:0} ClusterID:a14e89a7-792d-44d3-89af-7037442eacbc StartedAt:1557537674395557548 LastUp:1557537671113461486}
I190511 01:21:14.916458 59 gossip/gossip.go:1510  [n1] node has connected to cluster via gossip
I190511 01:21:14.916660 59 storage/stores.go:263  [n1] wrote 0 node addresses to persistent storage
I190511 01:21:24.480247 116 storage/store.go:4220  [n1,s1] sstables (read amplification = 2):
0 [ 51K 1 ]: 51K
6 [  1M 1 ]: 1M
I190511 01:21:24.480380 116 storage/store.go:4221  [n1,s1] 
** Compaction Stats [default] **
Level    Files   Size     Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop
----------------------------------------------------------------------------------------------------------------------------------------------------------
  L0      1/0   50.73 KB   0.5      0.0     0.0      0.0       0.0      0.0       0.0   1.0      0.0      8.0         0         1    0.006       0      0
  L6      1/0    1.26 MB   0.0      0.0     0.0      0.0       0.0      0.0       0.0   0.0      0.0      0.0         0         0    0.000       0      0
 Sum      2/0    1.31 MB   0.0      0.0     0.0      0.0       0.0      0.0       0.0   1.0      0.0      8.0         0         1    0.006       0      0
 Int      0/0    0.00 KB   0.0      0.0     0.0      0.0       0.0      0.0       0.0   1.0      0.0      8.0         0         1    0.006       0      0
Uptime(secs): 10.6 total, 10.6 interval
Flush(GB): cumulative 0.000, interval 0.000
AddFile(GB): cumulative 0.000, interval 0.000
AddFile(Total Files): cumulative 0, interval 0
AddFile(L0 Files): cumulative 0, interval 0
AddFile(Keys): cumulative 0, interval 0
Cumulative compaction: 0.00 GB write, 0.00 MB/s write, 0.00 GB read, 0.00 MB/s read, 0.0 seconds
Interval compaction: 0.00 GB write, 0.00 MB/s write, 0.00 GB read, 0.00 MB/s read, 0.0 seconds
Stalls(count): 0 level0_slowdown, 0 level0_slowdown_with_compaction, 0 level0_numfiles, 0 level0_numfiles_with_compaction, 0 stop for pending_compaction_bytes, 0 slowdown for pending_compaction_bytes, 0 memtable_compaction, 0 memtable_slowdown, interval 0 total count
estimated_pending_compaction_bytes: 0 B
I190511 01:21:24.481565 121 server/status/runtime.go:500  [n1] runtime stats: 170 MiB RSS, 114 goroutines, 0 B/0 B/0 B GO alloc/idle/total, 14 MiB/16 MiB CGO alloc/total, 0.0 CGO/sec, 0.0/0.0 %(u/s)time, 0.0 %gc (7x), 50 KiB/1.5 MiB (r/w)net

阻止加入的可能原因是什么？感谢您的建议！

Answer 1

您之前似乎已经单独启动了第二个节点（128.199.127.164 上的运行），创建了自己的集群。

这个可以在报错信息中看到：

W190511 01:21:14.251763 65 gossip/client.go:122  [n?] failed to start gossip client to 163.172.156.111:26257: initial connection heartbeat failed: rpc error: code = Unknown desc = client cluster ID "a14e89a7-792d-44d3-89af-7037442eacbc" doesn't match server cluster ID "3e797faa-59a1-4b0d-83b5-36143ddbdd69"

要加入集群，加入节点的数据目录必须为空。您可以删除 cockroach-data 或使用 --store=/path/to/data-dir

指定备用目录

无法将节点添加到 cockroachde 集群

Can't add node to the cockroachde cluster

cockroachdb