Cassandra 节点无法完成加入操作
Cassandra node can't complete joining operation
尝试向现有 C* 2.1.11 集群添加新节点时,该节点似乎已完成 bootstrap 的流式处理阶段,但我无法找到其原因的解释未从 JOINING 状态移动;所有节点的 cassandra 日志在所有流式传输过程中都没有显示错误。
nodetool status
报告节点在所有节点中为UJ,负载量大于其余节点:
# nodetool status
Datacenter: us-east-vpc
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN xx.xx.xx.78 564.96 GB 256 ? xxxx-f3c7d9d40e92 1d
UN xx.xx.xx.110 534.63 GB 256 ? xxxx-9419faa478ca 1a
UN xx.xx.xx.171 557.13 GB 256 ? xxxx-7a5b2723e438 1a
UN xx.xx.xx.203 406.98 GB 256 ? xxxx-1331d9c44992 1a
UN xx.xx.xx.26 579.55 GB 256 ? xxxx-88b202a8cedc 1c
UN xx.xx.xx.122 603.39 GB 256 ? xxxx-b0b81ebabeb2 1d
UN xx.xx.xx.233 565.3 GB 256 ? xxxx-a2fa9ad67741 1c
UJ xx.xx.xx.56 881.91 GB 256 ? xxxx-9863c7799fad 1d
nodetool netstats
在其他节点上没有显示 activity 但在新节点上显示要传输的空文件列表:
# nodetool netstats
Mode: JOINING
Bootstrap xxxx-8d0c340f238b
/xx.xx.xx.233
/xx.xx.xx.122
/xx.xx.xx.171
/xx.xx.xx.78
Read Repair Statistics:
Attempted: 0
Mismatch (Blocking): 0
Mismatch (Background): 0
Pool Name Active Pending Completed
Commands n/a 0 50
Responses n/a 0 64941
nodetool info
在尝试检索令牌范围信息时抛出错误:
# nodetool info
ID : xxxx-9863c7799fad
Gossip active : true
Thrift active : false
Native Transport active: false
Load : 881.91 GB
Generation No : 1475450119
Uptime (seconds) : 12081
Heap Memory (MB) : 1480.71 / 1996.00
Off Heap Memory (MB) : 204.47
Data Center : us-east-vpc
Rack : 1d
Exceptions : 2
Key Cache : entries 3262, size 788.43 KB, capacity 99 MB, 43 hits, 3249 requests, 0.013 recent hit rate, 14400 save period in seconds
Row Cache : entries 0, size 0 bytes, capacity 0 bytes, 0 hits, 0 requests, NaN recent hit rate, 0 save period in seconds
Counter Cache : entries 0, size 0 bytes, capacity 49 MB, 0 hits, 0 requests, NaN recent hit rate, 7200 save period in seconds
error: null
-- StackTrace --
java.lang.AssertionError
at org.apache.cassandra.locator.TokenMetadata.getTokens(TokenMetadata.java:474)
at org.apache.cassandra.service.StorageService.getTokens(StorageService.java:2263)
at org.apache.cassandra.service.StorageService.getTokens(StorageService.java:2252)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:71)
at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:275)
at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112)
at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46)
at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237)
at com.sun.jmx.mbeanserver.PerInterface.getAttribute(PerInterface.java:83)
at com.sun.jmx.mbeanserver.MBeanSupport.getAttribute(MBeanSupport.java:206)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:647)
at com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(JmxMBeanServer.java:678)
at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1445)
at javax.management.remote.rmi.RMIConnectionImpl.access0(RMIConnectionImpl.java:76)
at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1309)
at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1401)
at javax.management.remote.rmi.RMIConnectionImpl.getAttribute(RMIConnectionImpl.java:639)
at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:324)
at sun.rmi.transport.Transport.run(Transport.java:200)
at sun.rmi.transport.Transport.run(Transport.java:197)
at java.security.AccessController.doPrivileged(Native Method)
at sun.rmi.transport.Transport.serviceCall(Transport.java:196)
at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:568)
at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:826)
at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run[=12=](TCPTransport.java:683)
at java.security.AccessController.doPrivileged(Native Method)
at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:682)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
任何帮助将不胜感激。
编辑 10 月 3 日
发现实例是 运行 out of space,最后我们得到一个错误,没有足够的 space 来完成压缩。扩展分区并清除 /data
文件夹以从头开始 bootstrap;扩盘后,推流完成,但还是无法从UJ
移动到UN
;日志上没有错误,nodetool tpstats
显示没有待处理的任务,nodetool netstats
没有返回任何待处理的 activity,具有相同的 bootstrap UUID:
# nodetool netstats
Mode: JOINING
Bootstrap xxxx-8d0c340f238b
/xx.xx.xx.233
/xx.xx.xx.122
/xx.xx.xx.171
/xx.xx.xx.78
Read Repair Statistics:
Attempted: 0
Mismatch (Blocking): 0
Mismatch (Background): 0
Pool Name Active Pending Completed
Commands n/a 0 130
Responses n/a 0 256088
还有一个问题,为什么那个节点的负载会增加
由于没有错误报告,并且流式传输过程已完成,我们假设节点已准备好加入集群。
我们在cassandra.yaml文件中添加了auto_bootstrap: False
指令,重启了节点中的服务,它加入了集群。
加入集群后,执行了全面修复和清理。
尝试向现有 C* 2.1.11 集群添加新节点时,该节点似乎已完成 bootstrap 的流式处理阶段,但我无法找到其原因的解释未从 JOINING 状态移动;所有节点的 cassandra 日志在所有流式传输过程中都没有显示错误。
nodetool status
报告节点在所有节点中为UJ,负载量大于其余节点:
# nodetool status
Datacenter: us-east-vpc
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN xx.xx.xx.78 564.96 GB 256 ? xxxx-f3c7d9d40e92 1d
UN xx.xx.xx.110 534.63 GB 256 ? xxxx-9419faa478ca 1a
UN xx.xx.xx.171 557.13 GB 256 ? xxxx-7a5b2723e438 1a
UN xx.xx.xx.203 406.98 GB 256 ? xxxx-1331d9c44992 1a
UN xx.xx.xx.26 579.55 GB 256 ? xxxx-88b202a8cedc 1c
UN xx.xx.xx.122 603.39 GB 256 ? xxxx-b0b81ebabeb2 1d
UN xx.xx.xx.233 565.3 GB 256 ? xxxx-a2fa9ad67741 1c
UJ xx.xx.xx.56 881.91 GB 256 ? xxxx-9863c7799fad 1d
nodetool netstats
在其他节点上没有显示 activity 但在新节点上显示要传输的空文件列表:
# nodetool netstats
Mode: JOINING
Bootstrap xxxx-8d0c340f238b
/xx.xx.xx.233
/xx.xx.xx.122
/xx.xx.xx.171
/xx.xx.xx.78
Read Repair Statistics:
Attempted: 0
Mismatch (Blocking): 0
Mismatch (Background): 0
Pool Name Active Pending Completed
Commands n/a 0 50
Responses n/a 0 64941
nodetool info
在尝试检索令牌范围信息时抛出错误:
# nodetool info
ID : xxxx-9863c7799fad
Gossip active : true
Thrift active : false
Native Transport active: false
Load : 881.91 GB
Generation No : 1475450119
Uptime (seconds) : 12081
Heap Memory (MB) : 1480.71 / 1996.00
Off Heap Memory (MB) : 204.47
Data Center : us-east-vpc
Rack : 1d
Exceptions : 2
Key Cache : entries 3262, size 788.43 KB, capacity 99 MB, 43 hits, 3249 requests, 0.013 recent hit rate, 14400 save period in seconds
Row Cache : entries 0, size 0 bytes, capacity 0 bytes, 0 hits, 0 requests, NaN recent hit rate, 0 save period in seconds
Counter Cache : entries 0, size 0 bytes, capacity 49 MB, 0 hits, 0 requests, NaN recent hit rate, 7200 save period in seconds
error: null
-- StackTrace --
java.lang.AssertionError
at org.apache.cassandra.locator.TokenMetadata.getTokens(TokenMetadata.java:474)
at org.apache.cassandra.service.StorageService.getTokens(StorageService.java:2263)
at org.apache.cassandra.service.StorageService.getTokens(StorageService.java:2252)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:71)
at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:275)
at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112)
at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46)
at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237)
at com.sun.jmx.mbeanserver.PerInterface.getAttribute(PerInterface.java:83)
at com.sun.jmx.mbeanserver.MBeanSupport.getAttribute(MBeanSupport.java:206)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:647)
at com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(JmxMBeanServer.java:678)
at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1445)
at javax.management.remote.rmi.RMIConnectionImpl.access0(RMIConnectionImpl.java:76)
at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1309)
at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1401)
at javax.management.remote.rmi.RMIConnectionImpl.getAttribute(RMIConnectionImpl.java:639)
at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:324)
at sun.rmi.transport.Transport.run(Transport.java:200)
at sun.rmi.transport.Transport.run(Transport.java:197)
at java.security.AccessController.doPrivileged(Native Method)
at sun.rmi.transport.Transport.serviceCall(Transport.java:196)
at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:568)
at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:826)
at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run[=12=](TCPTransport.java:683)
at java.security.AccessController.doPrivileged(Native Method)
at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:682)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
任何帮助将不胜感激。
编辑 10 月 3 日
发现实例是 运行 out of space,最后我们得到一个错误,没有足够的 space 来完成压缩。扩展分区并清除 /data
文件夹以从头开始 bootstrap;扩盘后,推流完成,但还是无法从UJ
移动到UN
;日志上没有错误,nodetool tpstats
显示没有待处理的任务,nodetool netstats
没有返回任何待处理的 activity,具有相同的 bootstrap UUID:
# nodetool netstats
Mode: JOINING
Bootstrap xxxx-8d0c340f238b
/xx.xx.xx.233
/xx.xx.xx.122
/xx.xx.xx.171
/xx.xx.xx.78
Read Repair Statistics:
Attempted: 0
Mismatch (Blocking): 0
Mismatch (Background): 0
Pool Name Active Pending Completed
Commands n/a 0 130
Responses n/a 0 256088
还有一个问题,为什么那个节点的负载会增加
由于没有错误报告,并且流式传输过程已完成,我们假设节点已准备好加入集群。
我们在cassandra.yaml文件中添加了auto_bootstrap: False
指令,重启了节点中的服务,它加入了集群。
加入集群后,执行了全面修复和清理。