具有远程节点的 Akka (.net) 集群:解除关联异常
Akka (.net) cluster with remote nodes: Disassociated exception
使用 akka (.net) 我正在尝试实现简单的集群用例。
- 集群 - 节点 up/down 事件。
- 远程 - 用于向特定节点发送消息。
有两个参与者:监听集群事件的主节点和连接到集群的从节点。
Address address = new Address("akka.tcp", "ClusterSystem", "master", 8080);
cluster.Join(address);
当 ClusterEvent.MemberUp 消息被接收主节点创建 actor link:
ClusterEvent.MemberUp up = message as ClusterEvent.MemberUp;
ActorSelection nodeActor = system.ActorSelection(up.Member.Address + "/user/slave_0");
向该演员发送消息导致错误:
与远程系统的关联akka.tcp://ClusterSystem@slave:8090 失败;地址现在被门控 5000 毫秒。原因是:[分离]
主配置:
akka {
actor {
provider = ""Akka.Cluster.ClusterActorRefProvider, Akka.Cluster""
}
remote {
helios.tcp {
port = 8080
hostname = master
bind-hostname = master
bind-port = 8080
send-buffer-size = 512000b
receive-buffer-size = 512000b
maximum-frame-size = 1024000b
tcp-keepalive = on
}
}
cluster{
failure-detector {
heartbeat - interval = 10 s
}
auto-down-unreachable-after = 10s
gossip-interval = 5s
}
stdout-loglevel = DEBUG
loglevel = DEBUG
debug {{
receive = on
autoreceive = on
lifecycle = on
event-stream = on
unhandled = on
}}
}
从机配置:
akka {
actor {
provider = ""Akka.Cluster.ClusterActorRefProvider, Akka.Cluster""
}
remote {
helios.tcp {
port = 8090
hostname = slave
bind-hostname = slave
bind-port = 8090
send-buffer-size = 512000b
receive-buffer-size = 512000b
maximum-frame-size = 1024000b
tcp-keepalive = on
}
}
cluster{
failure-detector {
heartbeat - interval = 10 s
}
auto-down-unreachable-after = 10s
gossip-interval = 5s
}
stdout-loglevel = DEBUG
loglevel = DEBUG
debug {{
receive = on
autoreceive = on
lifecycle = on
event-stream = on
unhandled = on
}}
}
这是你的问题:
cluster{
failure-detector {
heartbeat - interval = 10 s
}
auto-down-unreachable-after = 10s
gossip-interval = 5s
}
heartbeat-interval 和 auto-down-unreachable-after 的持续时间相同 - 因此您的节点几乎总是会在 10 秒后自动断开关联,因为您打赌故障检测器可能会失败。
auto-down-unreachable-after 是一个危险的设置 - 不要使用它。你最终会出现脑裂或更糟的情况。
并确保您的故障检测器间隔始终低于您的自动停机间隔。
使用 akka (.net) 我正在尝试实现简单的集群用例。
- 集群 - 节点 up/down 事件。
- 远程 - 用于向特定节点发送消息。
有两个参与者:监听集群事件的主节点和连接到集群的从节点。
Address address = new Address("akka.tcp", "ClusterSystem", "master", 8080);
cluster.Join(address);
当 ClusterEvent.MemberUp 消息被接收主节点创建 actor link:
ClusterEvent.MemberUp up = message as ClusterEvent.MemberUp;
ActorSelection nodeActor = system.ActorSelection(up.Member.Address + "/user/slave_0");
向该演员发送消息导致错误:
与远程系统的关联akka.tcp://ClusterSystem@slave:8090 失败;地址现在被门控 5000 毫秒。原因是:[分离]
主配置:
akka {
actor {
provider = ""Akka.Cluster.ClusterActorRefProvider, Akka.Cluster""
}
remote {
helios.tcp {
port = 8080
hostname = master
bind-hostname = master
bind-port = 8080
send-buffer-size = 512000b
receive-buffer-size = 512000b
maximum-frame-size = 1024000b
tcp-keepalive = on
}
}
cluster{
failure-detector {
heartbeat - interval = 10 s
}
auto-down-unreachable-after = 10s
gossip-interval = 5s
}
stdout-loglevel = DEBUG
loglevel = DEBUG
debug {{
receive = on
autoreceive = on
lifecycle = on
event-stream = on
unhandled = on
}}
}
从机配置:
akka {
actor {
provider = ""Akka.Cluster.ClusterActorRefProvider, Akka.Cluster""
}
remote {
helios.tcp {
port = 8090
hostname = slave
bind-hostname = slave
bind-port = 8090
send-buffer-size = 512000b
receive-buffer-size = 512000b
maximum-frame-size = 1024000b
tcp-keepalive = on
}
}
cluster{
failure-detector {
heartbeat - interval = 10 s
}
auto-down-unreachable-after = 10s
gossip-interval = 5s
}
stdout-loglevel = DEBUG
loglevel = DEBUG
debug {{
receive = on
autoreceive = on
lifecycle = on
event-stream = on
unhandled = on
}}
}
这是你的问题:
cluster{
failure-detector {
heartbeat - interval = 10 s
}
auto-down-unreachable-after = 10s
gossip-interval = 5s
}
heartbeat-interval 和 auto-down-unreachable-after 的持续时间相同 - 因此您的节点几乎总是会在 10 秒后自动断开关联,因为您打赌故障检测器可能会失败。
auto-down-unreachable-after 是一个危险的设置 - 不要使用它。你最终会出现脑裂或更糟的情况。
并确保您的故障检测器间隔始终低于您的自动停机间隔。