具有远程节点的 Akka (.net) 集群:解除关联异常

Akka (.net) cluster with remote nodes: Disassociated exception

使用 akka (.net) 我正在尝试实现简单的集群用例。

  1. 集群 - 节点 up/down 事件。
  2. 远程 - 用于向特定节点发送消息。

有两个参与者:监听集群事件的主节点和连接到集群的从节点。

Address address = new Address("akka.tcp", "ClusterSystem", "master", 8080);
cluster.Join(address);

当 ClusterEvent.MemberUp 消息被接收主节点创建 actor link:

ClusterEvent.MemberUp up = message as ClusterEvent.MemberUp;
ActorSelection nodeActor = system.ActorSelection(up.Member.Address + "/user/slave_0");

向该演员发送消息导致错误:


与远程系统的关联akka.tcp://ClusterSystem@slave:8090 失败;地址现在被门控 5000 毫秒。原因是:[分离]


主配置:

    akka {
        actor {
            provider = ""Akka.Cluster.ClusterActorRefProvider, Akka.Cluster""
        }

        remote {
            helios.tcp {
                port = 8080
                hostname = master
                bind-hostname = master
                bind-port = 8080
                send-buffer-size = 512000b
                receive-buffer-size = 512000b
                maximum-frame-size = 1024000b
                tcp-keepalive = on
            }
        }
        cluster{
            failure-detector {
                heartbeat - interval = 10 s
            }
            auto-down-unreachable-after = 10s
            gossip-interval = 5s
        }
        stdout-loglevel = DEBUG
        loglevel = DEBUG

        debug {{  
            receive = on 
            autoreceive = on
            lifecycle = on
            event-stream = on
            unhandled = on
        }}
    }

从机配置:

akka {
        actor {
            provider = ""Akka.Cluster.ClusterActorRefProvider, Akka.Cluster""
        }

    remote {
        helios.tcp {
            port = 8090
            hostname = slave
            bind-hostname = slave
            bind-port = 8090
            send-buffer-size = 512000b
            receive-buffer-size = 512000b
            maximum-frame-size = 1024000b
            tcp-keepalive = on
        }
    }
    cluster{
        failure-detector {
            heartbeat - interval = 10 s
        }
        auto-down-unreachable-after = 10s
        gossip-interval = 5s
    }
    stdout-loglevel = DEBUG
    loglevel = DEBUG

    debug {{  
        receive = on 
        autoreceive = on
        lifecycle = on
        event-stream = on
        unhandled = on
    }}

}

这是你的问题:

cluster{
            failure-detector {
                heartbeat - interval = 10 s
            }
            auto-down-unreachable-after = 10s
            gossip-interval = 5s
        }

heartbeat-interval 和 auto-down-unreachable-after 的持续时间相同 - 因此您的节点几乎总是会在 10 秒后自动断开关联,因为您打赌故障检测器可能会失败。

auto-down-unreachable-after 是一个危险的设置 - 不要使用它。你最终会出现脑裂或更糟的情况。

并确保您的故障检测器间隔始终低于您的自动停机间隔。