Akka.NET集群间歇性死信

Question

我们在本地（目前）有集群运行，一切似乎都配置正确。我们的素数计算消息分布在我们的种子节点上。但是，我们会间歇性地丢失消息。您可以在屏幕截图中看到两次运行的行为。哪些消息被标记为死信根本不一致。

我们的消息总是以相同的方式发送，看起来像这样。最后一个参数表示要查找的第n个素数。

new PrimeCalculationEntry(id, 1, 100000),
new PrimeCalculationEntry(id, 2, 150000),
new PrimeCalculationEntry(id, 3, 200000),
new PrimeCalculationEntry(id, 4, 250000),
new PrimeCalculationEntry(id, 5, 300000),
new PrimeCalculationEntry(id, 6, 350000),
new PrimeCalculationEntry(id, 7, 400000),
new PrimeCalculationEntry(id, 8, 450000)

我们的集群是这样设置的：一个非种子节点，它是一个组路由器，将消息发送到两个配置为池路由器的种子节点。

非种子节点：localhost:0（随机端口）

akka {
            actor {
                provider = cluster
                deployment {
                    /commander {
                        router = round-robin-group # routing strategy
                        routees.paths = ["/user/cluster"] # path of routee on each node
                        cluster {
                            enabled = on
                            allow-local-routees = on
                        }
                    }
                }
            }
            remote {
                dot-netty.tcp {
                    port = 0 #let os pick random port
                    hostname = localhost
                }
            }
            cluster {
                seed-nodes = ["akka.tcp://ClusterSystem@localhost:8081", "akka.tcp://ClusterSystem@localhost:8082"]
            }
        }

种子节点 1：localhost:8081（领导者）

akka {
            actor {
                provider = cluster
                deployment {
                    /cluster {
                        router = round-robin-pool
                        nr-of-instances = 10
                    }
                }
            }
            remote {
                dot-netty.tcp {
                    port = 8081
                    hostname = localhost
                }
            }
            cluster {
                seed-nodes = ["akka.tcp://ClusterSystem@localhost:8081"]
            }
        }

种子节点 2：localhost:8082

akka {
            actor {
                provider = cluster
                deployment {
                    /cluster {
                        router = round-robin-pool
                        nr-of-instances = 10
                    }
                }
            }
            remote {
                dot-netty.tcp {
                    port = 8082
                    hostname = localhost
                }
            }
            cluster {
                seed-nodes = ["akka.tcp://ClusterSystem@localhost:8081"]
            }
        }

谁能给我们指出正确的方向？我们的配置有什么问题吗？提前谢谢你。

Answer 1

我想我知道这里的问题是什么 - 你没有定义任何 akka.cluster.roles，你的 /commander 路由器也没有配置 use-role 设置 - 所以作为结果，每 N 条消息都被丢弃，因为它试图将消息 路由到它自己 并且没有 /user/cluster 参与者在场接收它。

要正确解决此问题，我们应该执行以下操作：

让所有可以处理 PrimeCalculationEntry 的节点声明 akka.cluster.roles=[prime]
让带有 /commander 路由器的节点将其 HOCON 更改为：

     /commander {
        router = round-robin-group # routing strategy
        routees.paths = ["/user/cluster"] # path of routee on each node
        cluster {
            enabled = on
            allow-local-routees = on
            use-role = "prime"
        }
    }

这将消除死信，因为 /commander 节点将不再每 N 次迭代向其自身发送消息。

Answer 2

我看到@Aaronontheweb 的回答太晚了。我们通过在指挥官 HOCON 上将 allow-local-routees 设置为关闭来“修复”它。但我想更好的解决方案是按照答案中的说明正确设置角色。

Akka.NET集群间歇性死信

Akka.NET cluster intermittent dead letters

akka

akka.net

akka.net-cluster