在 Gossip GenServer 进程在退出条件之前死亡

In Gossip GenServer processes dying before exit condition

我正在通过互相发送消息来创建多个 GenServers 八卦。我已经设置了一个退出条件,让每个进程在收到 10 条消息后就死掉。每个GenServer都是在launch函数中八卦开头创建的。

defmodule Gossip do
    use GenServer

    # starting gossip
    def start_link(watcher \ nil), do: GenServer.start_link(__MODULE__, watcher)
    def init(watcher), do: {:ok, {[],0,watcher}}
    def launch(n, watcher \ nil) do
        crew = (for _ <- 0..n, do: elem(Gossip.start_link(watcher),1))
        Enum.map(crew, &(add_crew(&1,crew--[&1])))
        crew
            |> hd()
            |> Gossip.send_msg()
    end 


    # client side
    def add_crew(pid, crew), do: GenServer.cast(pid, {:add_crew, crew})
    def rcv_msg(pid, msg \ ""), do: GenServer.cast(pid, {:rcv_msg, msg})
    def send_msg(pid, msg \ ""), do: GenServer.cast(pid, {:send_msg, msg})


    # server side  
    def handle_cast({:add_crew, crew}, {_, msg_counter, watcher}), do:
        {:noreply, {crew, msg_counter, watcher}}

    def handle_cast({:rcv_msg, _msg}, {crew, msg_counter, watcher}) do
        if msg_counter < 10 do
            send_msg(self())
        else
            GossipWatcher.increase(watcher)
            IO.inspect(self(), label: "exit of:") |> Process.exit(:normal)
        end
        {:noreply, {crew, msg_counter+1, watcher}}
    end

    def handle_cast({:send_msg,_},{[],_,_}), do: Process.exit(self(),"crew empty")
    def handle_cast({:send_msg, _msg}, {crew, msg_counter, watcher}=state) do
        rcpt = Enum.random(crew) ## recipient of the msg
        if Process.alive?(rcpt) do
            IO.inspect({self(),rcpt}, label: "send message from/to")
            rcv_msg(rcpt, "ChitChat")
            send_msg(self())
            {:noreply, state}
        else
        IO.inspect(rcpt, label: "recipient is dead:")
        {:noreply, {crew -- [rcpt], msg_counter, watcher}}
        end
    end
end


defmodule GossipWatcher do
    use GenServer

    def start_link(opt \ []), do: GenServer.start_link(__MODULE__, opt)
    def init(opt), do: {:ok, {0}}
    def increase(pid), do: GenServer.cast(pid, {:increase})  
    def handle_cast({:increase}, {counter}), do:
        IO.inspect({:noreply, {counter+1}}, label: "toll of dead")

end

我使用模块 GossipWatcher 来监控在收到 10 条消息后死亡的 GenServer 人数。问题是 iex 提示返回,而仍有一些 GenServers 活着 。例如超过1000GenServer,只有~964GenServers在八卦结束时死亡。

iex(15)> {:ok, watcher} = GossipWatcher.start_link
{:ok, #PID<0.11163.0>}
iex(16)> Gossip.launch 100, watcher            
send message from/to: {#PID<0.11165.0>, #PID<0.11246.0>}
:ok     
send message from/to: {#PID<0.11165.0>, #PID<0.11167.0>}
send message from/to: {#PID<0.11246.0>, #PID<0.11182.0>}
send message from/to: {#PID<0.11165.0>, #PID<0.11217.0>}
...
toll of dead: {:noreply, {960}}
toll of dead: {:noreply, {961}}
toll of dead: {:noreply, {962}}
toll of dead: {:noreply, {963}}
toll of dead: {:noreply, {964}}
iex(17)>

我是不是漏掉了什么?进程是否超时?任何帮助将不胜感激
TIA.

您的代码中可以玩一些花样的部分在这里:

def handle_cast({:send_periodic_message}, zero_counter_gossip_true) do

    ...

    if (Process.alive?(rcpt)) == true do

    ...

    else
        IO.inspect(rcpt, label: "recipient is dead:")
        {:noreply, {crew -- [rcpt], msg_counter, watcher}}
    end
end

在else的这一部分,你允许GenServer停止工作:因为它不会向邻居或他自己发送消息,所以"action" 已启动,它只是停止做某事。
在最坏和不太可能的情况下:如果你从 2000 GenServer 开始并从一个 GenServer 发起八卦,并且第一个只与第二个交谈,而第二个也只与第一个交谈.. .. 然后只有一个 GenServer 会死掉,你会返回命令提示符,仍然有 1999 GenServer 活着但什么都不做(因为他们收到 0 条消息)。

即使这个案例有些牵强,它表明八卦的执行可以在每个 GenServer 收到 10 条消息之前提前结束。因此,您描述的行为。


我做了一些测试,rewriting your code,并使用第二种类型的 GenServer 来监测有多少 GenServers 被杀,有多少幸存。事实证明,在我返回 iex 提示后,在 1000 个 GenServers 中,我平均有 40 个 GenServer 还活着。