Erlang 监控多个进程
Erlang monitor multiple processes
我需要监控一堆工作进程。目前我可以通过 1 个监视器监视 1 个进程。我如何将其扩展到监视 N 个工作进程。我还需要生成 N 个监视器吗?如果是这样,那么如果其中一个生成的监视器 failed/crashed 会发生什么?
不要生成然后监控,过去曾在生产中引起问题,而是使用 spawn_monitor
您可以从您的主管启动和监控多个进程,如果您查看 monitor 上的文档,您会注意到每次被监控的进程死亡时,它都会发送如下消息:
{'DOWN', MonitorRef, Type, Object, Info}
到正在监视刚刚死掉的进程的主管进程
然后你就可以决定做什么了,MonitorRef就是你开始监控进程时得到的Reference,Object将具有死亡进程的 Pid,如果您为其分配名称,则为注册名称。
使用监视器创建一些示例代码是一个很好的练习,但请尽量坚持使用 OTP 库和 OTP 主管。
Do i need to spawn N monitors as well?
否:
-module(mo).
-compile(export_all).
worker(Id) ->
timer:sleep(1000 * rand:uniform(5)),
io:format("Worker~w: I'm still alive~n", [Id]),
worker(Id).
create_workers(N) ->
Workers = [ % { {Pid, Ref}, Id }
{ spawn_monitor(?MODULE, worker, [Id]), Id }
|| Id <- lists:seq(1, N)
],
monitor_workers(Workers).
monitor_workers(Workers) ->
receive
{'DOWN', Ref, process, Pid, Why} ->
Worker = {Pid, Ref},
case is_my_worker(Worker, Workers) of
true ->
NewWorkers = replace_worker(Worker, Workers, Why),
io:format("Old Workers:~n~p~n", [Workers]),
io:format("New Workers:~n~p~n", [NewWorkers]),
monitor_workers(NewWorkers);
false ->
monitor_workers(Workers)
end;
_Other ->
monitor_workers(Workers)
end.
is_my_worker(Worker, Workers) ->
lists:keymember(Worker, 1, Workers).
replace_worker(Worker, Workers, Why) ->
{{Pid, _}, Id} = lists:keyfind(Worker, 1, Workers),
io:format("Worker~w (~w) went down: ~s~n", [Id, Pid, Why]),
NewWorkers = lists:keydelete(Worker, 1, Workers),
NewWorker = spawn_monitor(?MODULE, worker, [Id]),
[{NewWorker, Id}|NewWorkers].
start() ->
observer:start(), %%In the Processes tab, you can right click on a worker and kill it.
create_workers(4).
在shell中:
$ ./run
Erlang/OTP 19 [erts-8.2] [source] [64-bit] [smp:4:4] [async-threads:10] [hipe] [kernel-poll:false]
Eshell V8.2 (abort with ^G)
1> Worker3: I'm still alive
Worker1: I'm still alive
Worker2: I'm still alive
Worker4: I'm still alive
Worker3: I'm still alive
Worker1: I'm still alive
Worker4: I'm still alive
Worker2: I'm still alive
Worker3: I'm still alive
Worker1: I'm still alive
Worker4: I'm still alive
Worker3 (<0.87.0>) went down: killed
Old Workers:
[{{<0.85.0>,#Ref<0.0.4.292>},1},
{{<0.86.0>,#Ref<0.0.4.293>},2},
{{<0.87.0>,#Ref<0.0.4.294>},3},
{{<0.88.0>,#Ref<0.0.4.295>},4}]
New Workers:
[{{<0.2386.0>,#Ref<0.0.1.416>},3},
{{<0.85.0>,#Ref<0.0.4.292>},1},
{{<0.86.0>,#Ref<0.0.4.293>},2},
{{<0.88.0>,#Ref<0.0.4.295>},4}]
Worker2: I'm still alive
Worker1: I'm still alive
Worker2: I'm still alive
Worker1: I'm still alive
Worker1: I'm still alive
Worker4: I'm still alive
Worker3: I'm still alive
Worker2: I'm still alive
Worker1: I'm still alive
Worker3: I'm still alive
Worker4: I'm still alive
Worker1: I'm still alive
Worker4 (<0.88.0>) went down: killed
Old Workers:
[{{<0.2386.0>,#Ref<0.0.1.416>},3},
{{<0.85.0>,#Ref<0.0.4.292>},1},
{{<0.86.0>,#Ref<0.0.4.293>},2},
{{<0.88.0>,#Ref<0.0.4.295>},4}]
New Workers:
[{{<0.5322.0>,#Ref<0.0.1.9248>},4},
{{<0.2386.0>,#Ref<0.0.1.416>},3},
{{<0.85.0>,#Ref<0.0.4.292>},1},
{{<0.86.0>,#Ref<0.0.4.293>},2}]
Worker3: I'm still alive
Worker2: I'm still alive
Worker4: I'm still alive
Worker1: I'm still alive
Worker3: I'm still alive
Worker3: I'm still alive
Worker2: I'm still alive
Worker1 (<0.85.0>) went down: killed
Old Workers:
[{{<0.5322.0>,#Ref<0.0.1.9248>},4},
{{<0.2386.0>,#Ref<0.0.1.416>},3},
{{<0.85.0>,#Ref<0.0.4.292>},1},
{{<0.86.0>,#Ref<0.0.4.293>},2}]
New Workers:
[{{<0.5710.0>,#Ref<0.0.1.10430>},1},
{{<0.5322.0>,#Ref<0.0.1.9248>},4},
{{<0.2386.0>,#Ref<0.0.1.416>},3},
{{<0.86.0>,#Ref<0.0.4.293>},2}]
Worker2: I'm still alive
Worker3: I'm still alive
Worker4: I'm still alive
Worker3: I'm still alive
我认为下面的版本可能更有效:它使用 lists:map()
来搜索和替换崩溃的 worker,因此它只遍历 Worker 的列表一次:
-module(mo).
-compile(export_all).
worker(Id) ->
timer:sleep(1000 * rand:uniform(5)),
io:format("Worker~w: I'm still alive~n", [Id]),
worker(Id).
create_workers(N) ->
Workers = [ % { {Pid, Ref}, Id }
{ spawn_monitor(?MODULE, worker, [Id]), Id }
|| Id <- lists:seq(1,N)
],
monitor_workers(Workers).
monitor_workers(Workers) ->
receive
{'DOWN', Ref, process, Pid, Why} ->
CrashedWorker = {Pid, Ref},
NewWorkers = replace(CrashedWorker, Workers, Why),
io:format("Old Workers:~n~p~n", [Workers]),
io:format("New Workers:~n~p~n", [NewWorkers]),
monitor_workers(NewWorkers);
_Other ->
monitor_workers(Workers)
end.
replace(CrashedWorker, Workers, Why) ->
lists:map(fun(PidRefId) ->
{ {Pid,_Ref}=Worker, Id} = PidRefId,
case Worker =:= CrashedWorker of
true -> %replace worker
io:format("Worker~w (~w) went down: ~s~n",
[Id, Pid, Why]),
{spawn_monitor(?MODULE, worker, [Id]), Id}; %=> { {Pid,Ref}, Id }
false -> %leave worker alone
PidRefId
end
end,
Workers).
start() ->
observer:start(), %%In the Processes tab, you can right click on a worker and kill it.
create_workers(4).
If so then what happens if one of those spawned monitors failed/crashed?
Erlang 在不同的国家拥有多个服务器场,并且 erlang 已经获得了多个冗余电网,因此 elrang 将在一个永不失败的容错分布式系统中重启一切。一切都是内置的。您不必担心任何事情。 :)
实际上...任何您可以想象出故障的地方,都必须对其进行备份,例如通过另一台计算机上的另一个监视进程。
我需要监控一堆工作进程。目前我可以通过 1 个监视器监视 1 个进程。我如何将其扩展到监视 N 个工作进程。我还需要生成 N 个监视器吗?如果是这样,那么如果其中一个生成的监视器 failed/crashed 会发生什么?
不要生成然后监控,过去曾在生产中引起问题,而是使用 spawn_monitor
您可以从您的主管启动和监控多个进程,如果您查看 monitor 上的文档,您会注意到每次被监控的进程死亡时,它都会发送如下消息:
{'DOWN', MonitorRef, Type, Object, Info}
到正在监视刚刚死掉的进程的主管进程
然后你就可以决定做什么了,MonitorRef就是你开始监控进程时得到的Reference,Object将具有死亡进程的 Pid,如果您为其分配名称,则为注册名称。
使用监视器创建一些示例代码是一个很好的练习,但请尽量坚持使用 OTP 库和 OTP 主管。
Do i need to spawn N monitors as well?
否:
-module(mo).
-compile(export_all).
worker(Id) ->
timer:sleep(1000 * rand:uniform(5)),
io:format("Worker~w: I'm still alive~n", [Id]),
worker(Id).
create_workers(N) ->
Workers = [ % { {Pid, Ref}, Id }
{ spawn_monitor(?MODULE, worker, [Id]), Id }
|| Id <- lists:seq(1, N)
],
monitor_workers(Workers).
monitor_workers(Workers) ->
receive
{'DOWN', Ref, process, Pid, Why} ->
Worker = {Pid, Ref},
case is_my_worker(Worker, Workers) of
true ->
NewWorkers = replace_worker(Worker, Workers, Why),
io:format("Old Workers:~n~p~n", [Workers]),
io:format("New Workers:~n~p~n", [NewWorkers]),
monitor_workers(NewWorkers);
false ->
monitor_workers(Workers)
end;
_Other ->
monitor_workers(Workers)
end.
is_my_worker(Worker, Workers) ->
lists:keymember(Worker, 1, Workers).
replace_worker(Worker, Workers, Why) ->
{{Pid, _}, Id} = lists:keyfind(Worker, 1, Workers),
io:format("Worker~w (~w) went down: ~s~n", [Id, Pid, Why]),
NewWorkers = lists:keydelete(Worker, 1, Workers),
NewWorker = spawn_monitor(?MODULE, worker, [Id]),
[{NewWorker, Id}|NewWorkers].
start() ->
observer:start(), %%In the Processes tab, you can right click on a worker and kill it.
create_workers(4).
在shell中:
$ ./run
Erlang/OTP 19 [erts-8.2] [source] [64-bit] [smp:4:4] [async-threads:10] [hipe] [kernel-poll:false]
Eshell V8.2 (abort with ^G)
1> Worker3: I'm still alive
Worker1: I'm still alive
Worker2: I'm still alive
Worker4: I'm still alive
Worker3: I'm still alive
Worker1: I'm still alive
Worker4: I'm still alive
Worker2: I'm still alive
Worker3: I'm still alive
Worker1: I'm still alive
Worker4: I'm still alive
Worker3 (<0.87.0>) went down: killed
Old Workers:
[{{<0.85.0>,#Ref<0.0.4.292>},1},
{{<0.86.0>,#Ref<0.0.4.293>},2},
{{<0.87.0>,#Ref<0.0.4.294>},3},
{{<0.88.0>,#Ref<0.0.4.295>},4}]
New Workers:
[{{<0.2386.0>,#Ref<0.0.1.416>},3},
{{<0.85.0>,#Ref<0.0.4.292>},1},
{{<0.86.0>,#Ref<0.0.4.293>},2},
{{<0.88.0>,#Ref<0.0.4.295>},4}]
Worker2: I'm still alive
Worker1: I'm still alive
Worker2: I'm still alive
Worker1: I'm still alive
Worker1: I'm still alive
Worker4: I'm still alive
Worker3: I'm still alive
Worker2: I'm still alive
Worker1: I'm still alive
Worker3: I'm still alive
Worker4: I'm still alive
Worker1: I'm still alive
Worker4 (<0.88.0>) went down: killed
Old Workers:
[{{<0.2386.0>,#Ref<0.0.1.416>},3},
{{<0.85.0>,#Ref<0.0.4.292>},1},
{{<0.86.0>,#Ref<0.0.4.293>},2},
{{<0.88.0>,#Ref<0.0.4.295>},4}]
New Workers:
[{{<0.5322.0>,#Ref<0.0.1.9248>},4},
{{<0.2386.0>,#Ref<0.0.1.416>},3},
{{<0.85.0>,#Ref<0.0.4.292>},1},
{{<0.86.0>,#Ref<0.0.4.293>},2}]
Worker3: I'm still alive
Worker2: I'm still alive
Worker4: I'm still alive
Worker1: I'm still alive
Worker3: I'm still alive
Worker3: I'm still alive
Worker2: I'm still alive
Worker1 (<0.85.0>) went down: killed
Old Workers:
[{{<0.5322.0>,#Ref<0.0.1.9248>},4},
{{<0.2386.0>,#Ref<0.0.1.416>},3},
{{<0.85.0>,#Ref<0.0.4.292>},1},
{{<0.86.0>,#Ref<0.0.4.293>},2}]
New Workers:
[{{<0.5710.0>,#Ref<0.0.1.10430>},1},
{{<0.5322.0>,#Ref<0.0.1.9248>},4},
{{<0.2386.0>,#Ref<0.0.1.416>},3},
{{<0.86.0>,#Ref<0.0.4.293>},2}]
Worker2: I'm still alive
Worker3: I'm still alive
Worker4: I'm still alive
Worker3: I'm still alive
我认为下面的版本可能更有效:它使用 lists:map()
来搜索和替换崩溃的 worker,因此它只遍历 Worker 的列表一次:
-module(mo).
-compile(export_all).
worker(Id) ->
timer:sleep(1000 * rand:uniform(5)),
io:format("Worker~w: I'm still alive~n", [Id]),
worker(Id).
create_workers(N) ->
Workers = [ % { {Pid, Ref}, Id }
{ spawn_monitor(?MODULE, worker, [Id]), Id }
|| Id <- lists:seq(1,N)
],
monitor_workers(Workers).
monitor_workers(Workers) ->
receive
{'DOWN', Ref, process, Pid, Why} ->
CrashedWorker = {Pid, Ref},
NewWorkers = replace(CrashedWorker, Workers, Why),
io:format("Old Workers:~n~p~n", [Workers]),
io:format("New Workers:~n~p~n", [NewWorkers]),
monitor_workers(NewWorkers);
_Other ->
monitor_workers(Workers)
end.
replace(CrashedWorker, Workers, Why) ->
lists:map(fun(PidRefId) ->
{ {Pid,_Ref}=Worker, Id} = PidRefId,
case Worker =:= CrashedWorker of
true -> %replace worker
io:format("Worker~w (~w) went down: ~s~n",
[Id, Pid, Why]),
{spawn_monitor(?MODULE, worker, [Id]), Id}; %=> { {Pid,Ref}, Id }
false -> %leave worker alone
PidRefId
end
end,
Workers).
start() ->
observer:start(), %%In the Processes tab, you can right click on a worker and kill it.
create_workers(4).
If so then what happens if one of those spawned monitors failed/crashed?
Erlang 在不同的国家拥有多个服务器场,并且 erlang 已经获得了多个冗余电网,因此 elrang 将在一个永不失败的容错分布式系统中重启一切。一切都是内置的。您不必担心任何事情。 :)
实际上...任何您可以想象出故障的地方,都必须对其进行备份,例如通过另一台计算机上的另一个监视进程。