docker 上的 repmgrd 和 supervisord - 失去 parent?
repmgrd and supervisord on docker - losing parent?
我用 PostgreSQL
和 repmgrd
创建了一个 Docker 图像,全部用 supervisor
启动。
我现在的问题是,当它启动时,由 supervisor
生成的 repmgrd
似乎有点死了,另一个代替了它。这导致我无法使用 supervisorctl
来控制它,而不得不解析为 pkill
或类似的方法来管理它。
Docker文件
FROM postgres:10
RUN apt-get -qq update && \
apt-get -qq install -y \
apt-transport-https \
lsb-release \
openssh-server \
postgresql-10-repmgr \
rsync \
supervisor > /dev/null && \
apt-get -qq autoremove -y && \
apt-get -qq clean && \
rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
# public keys configuration for passwordless login
COPY ssh/ /var/lib/postgresql/.ssh/
# postgres, sshd, supervisor and repmgr configuration
COPY etc/ /etc/
# helper scripts and entrypoint
COPY helpers/ /usr/local/bin/
ENTRYPOINT ["/usr/local/bin/pg-docker-entrypoint.sh"]
pg-docker-entrypoint.sh
只是启动 supervisord -c /etc/supervisor/supervisord.conf
。
supervisord.conf
[unix_http_server]
file = /var/run/supervisor.sock
chmod = 0770
chown = root:postgres
[rpcinterface:supervisor]
supervisor.rpcinterface_factory = supervisor.rpcinterface:make_main_rpcinterface
[supervisorctl]
serverurl = unix:///var/run/supervisor.sock
[supervisord]
logfile = /var/log/supervisor/supervisor.log
childlogdir = /var/log/supervisor
pidfile = /var/run/supervisord.pid
nodaemon = true
[program:sshd]
command = /usr/sbin/sshd -D -e
stdout_logfile = /var/log/supervisor/sshd-stdout.log
stderr_logfile = /var/log/supervisor/sshd-stderr.log
[program:postgres]
command = /docker-entrypoint.sh postgres -c config_file=/etc/postgresql/10/main/postgresql.conf
stdout_logfile = /var/log/supervisor/postgres-stdout.log
stderr_logfile = /var/log/supervisor/postgres-stderr.log
[program:repmgrd]
command = bash -c "sleep 10 && /usr/local/bin/repmgr_helper.sh"
user = postgres
stdout_logfile = /var/log/supervisor/repmgr-stdout.log
stderr_logfile = /var/log/supervisor/repmgr-stderr.log
[group:jm]
programs = sshd, postgres, repmgrd
repmgr_helper.sh
比/usr/lib/postgresql/10/bin/repmgrd --verbose
多一点。
repmgr.conf
node_id=1
node_name='pg-dock-1'
conninfo='host=pg-dock-1 port=5432 user=repmgr dbname=repmgr connect_timeout=60'
data_directory='/var/lib/postgresql/data/'
use_replication_slots=1
pg_bindir='/usr/lib/postgresql/10/bin/'
failover='automatic'
promote_command='/usr/bin/repmgr standby promote --log-to-file'
follow_command='/usr/bin/repmgr standby follow --log-to-file -W --upstream-node-id=%n'
ps
输出
root@9f39cb085506:/# ps -ef
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 11:54 ? 00:00:00 bash /usr/local/bin/pg-docker-entrypoint.sh
root 10 1 0 11:54 ? 00:00:01 /usr/bin/python /usr/bin/supervisord -c /etc/supervisor/supervisord.conf
root 13 10 0 11:54 ? 00:00:00 /usr/sbin/sshd -D -e
postgres 15 10 0 11:54 ? 00:00:07 postgres -c config_file=/etc/postgresql/10/main/postgresql.conf
postgres 36 15 0 11:54 ? 00:00:00 postgres: checkpointer process
postgres 37 15 0 11:54 ? 00:00:00 postgres: writer process
postgres 38 15 0 11:54 ? 00:00:00 postgres: wal writer process
postgres 39 15 0 11:54 ? 00:00:00 postgres: autovacuum launcher process
postgres 40 15 0 11:54 ? 00:00:00 postgres: archiver process
postgres 41 15 0 11:54 ? 00:00:01 postgres: stats collector process
postgres 42 15 0 11:54 ? 00:00:00 postgres: bgworker: logical replication launcher
postgres 51 15 0 11:54 ? 00:00:00 postgres: wal sender process repmgr 10.0.14.4(33812) streaming 0/4002110
postgres 55 15 0 11:54 ? 00:00:00 postgres: repmgr repmgr 10.0.14.4(33824) idle
postgres 88 15 0 11:54 ? 00:00:01 postgres: repmgr repmgr 10.0.14.5(33496) idle
postgres 90 1 0 11:54 ? 00:00:03 /usr/lib/postgresql/10/bin/repmgrd --verbose
root 107 0 0 11:54 pts/0 00:00:00 bash
root 9323 107 0 12:50 pts/0 00:00:00 ps -ef
如您所见,repmgrd
进程现在是入口点的 child,而不是 supervisor
(如 sshd
和 postgres
)。我试过直接启动命令(没有 "helper"),我试过使用 bash -c
,我试过将 /usr/bin/repmgrd
指定为可执行文件,但无论我在end 我总是得出这个结果。
那么我的问题是 two-fold:为什么会发生这种情况,我该怎么做才能使 repmgrd
过程处于 supervisor
的控制之下。
编辑:按照建议,我在启动 repmgrd 时尝试使用 --daemonize=false
。
这种帮助,但不完全。查看输出:
root@6ab09e13f425:/# ps -ef
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 17:06 ? 00:00:00 bash /usr/local/bin/pg-docker-entrypoint.sh
root 11 1 2 17:06 ? 00:00:00 /usr/bin/python /usr/bin/supervisord -c /etc/supervisor/supervisord.conf
root 14 11 0 17:06 ? 00:00:00 /usr/sbin/sshd -D -e
postgres 15 11 0 17:06 ? 00:00:00 bash /usr/local/bin/repmgr_helper.sh
postgres 16 11 1 17:06 ? 00:00:00 postgres -c config_file=/etc/postgresql/10/main/postgresql.conf
postgres 37 16 0 17:06 ? 00:00:00 postgres: checkpointer process
postgres 38 16 0 17:06 ? 00:00:00 postgres: writer process
postgres 39 16 0 17:06 ? 00:00:00 postgres: wal writer process
postgres 40 16 0 17:06 ? 00:00:00 postgres: autovacuum launcher process
postgres 41 16 0 17:06 ? 00:00:00 postgres: archiver process
postgres 42 16 0 17:06 ? 00:00:00 postgres: stats collector process
postgres 43 16 0 17:06 ? 00:00:00 postgres: bgworker: logical replication launcher
postgres 44 16 0 17:06 ? 00:00:00 postgres: wal sender process repmgr 10.0.23.136(47132) streaming 0/4008E28
root 45 0 0 17:06 pts/0 00:00:00 bash
postgres 77 15 1 17:06 ? 00:00:00 /usr/lib/postgresql/10/bin/repmgrd --daemonize=false --verbose
postgres 78 16 0 17:06 ? 00:00:00 postgres: repmgr repmgr 10.0.23.136(47150) idle
postgres 79 16 0 17:06 ? 00:00:00 postgres: repmgr repmgr 10.0.23.134(43476) idle
root 86 45 0 17:06 pts/0 00:00:00 ps -ef
root@6ab09e13f425:/# supervisorctl stop jm:repmgrd
jm:repmgrd: stopped
root@6ab09e13f425:/# ps -ef
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 17:06 ? 00:00:00 bash /usr/local/bin/pg-docker-entrypoint.sh
root 11 1 1 17:06 ? 00:00:00 /usr/bin/python /usr/bin/supervisord -c /etc/supervisor/supervisord.conf
root 14 11 0 17:06 ? 00:00:00 /usr/sbin/sshd -D -e
postgres 16 11 0 17:06 ? 00:00:00 postgres -c config_file=/etc/postgresql/10/main/postgresql.conf
postgres 37 16 0 17:06 ? 00:00:00 postgres: checkpointer process
postgres 38 16 0 17:06 ? 00:00:00 postgres: writer process
postgres 39 16 0 17:06 ? 00:00:00 postgres: wal writer process
postgres 40 16 0 17:06 ? 00:00:00 postgres: autovacuum launcher process
postgres 41 16 0 17:06 ? 00:00:00 postgres: archiver process
postgres 42 16 0 17:06 ? 00:00:00 postgres: stats collector process
postgres 43 16 0 17:06 ? 00:00:00 postgres: bgworker: logical replication launcher
postgres 44 16 0 17:06 ? 00:00:00 postgres: wal sender process repmgr 10.0.23.136(47132) streaming 0/4008E60
root 45 0 0 17:06 pts/0 00:00:00 bash
postgres 77 1 0 17:06 ? 00:00:00 /usr/lib/postgresql/10/bin/repmgrd --daemonize=false --verbose
postgres 78 16 0 17:06 ? 00:00:00 postgres: repmgr repmgr 10.0.23.136(47150) idle
postgres 79 16 0 17:06 ? 00:00:00 postgres: repmgr repmgr 10.0.23.134(43476) idle
root 106 45 0 17:07 pts/0 00:00:00 ps -ef
启动时,进程保持 supervisor
,但停止它只会杀死 repmgr_helper.sh
,导致 "real" 进程保持活动状态并重新分配给 1
作为其 parent.
这并不理想,因为现在我遇到了一个奇怪的情况,进程处于活动状态,但 supervisor
认为它不是。因此发出 supervisorctl start jm:repmgrd
将失败说
[ERROR] PID file "/tmp/repmgrd.pid" exists and seems to contain a valid PID
[HINT] if repmgrd is no longer alive, remove the file and restart repmgrd
根据评论中的讨论更新了答案:
这些是当前解决方案的问题:
启动repmgrd的原命令:
command = bash -c "sleep 10 && /usr/local/bin/repmgr_helper.sh"
运行s bash,执行另一个 bash 脚本(即 bash 的另一个实例),然后 运行s repmgrd,这些进程太多,大部分都不需要
supervisord 希望调用的命令保留在前台,但默认情况下 repmgrd 自身守护进程
在排除故障时,repmgrd 生成的 pid 文件存在一些问题
可以通过以下更改解决这些问题:
要改用的命令:
command = /usr/local/bin/repmgr_helper.sh
/usr/local/bin/repmgr_helper.sh
第一步需要更新为运行sleep 10
/usr/local/bin/repmgr_helper.sh
作为最后一步应按以下方式调用 repmgrd:
exec /path/to/repmgrd --daemonize=false --no-pid-file
所以,一个。由于 exec
它替换了它启动它的脚本 b。它不会自行守护进程 c。它不会生成 pid 文件。
原始答案(更新前)
在启动命令中尝试将 --daemonize=false
传递给 repmgrd。
我用 PostgreSQL
和 repmgrd
创建了一个 Docker 图像,全部用 supervisor
启动。
我现在的问题是,当它启动时,由 supervisor
生成的 repmgrd
似乎有点死了,另一个代替了它。这导致我无法使用 supervisorctl
来控制它,而不得不解析为 pkill
或类似的方法来管理它。
Docker文件
FROM postgres:10
RUN apt-get -qq update && \
apt-get -qq install -y \
apt-transport-https \
lsb-release \
openssh-server \
postgresql-10-repmgr \
rsync \
supervisor > /dev/null && \
apt-get -qq autoremove -y && \
apt-get -qq clean && \
rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
# public keys configuration for passwordless login
COPY ssh/ /var/lib/postgresql/.ssh/
# postgres, sshd, supervisor and repmgr configuration
COPY etc/ /etc/
# helper scripts and entrypoint
COPY helpers/ /usr/local/bin/
ENTRYPOINT ["/usr/local/bin/pg-docker-entrypoint.sh"]
pg-docker-entrypoint.sh
只是启动 supervisord -c /etc/supervisor/supervisord.conf
。
supervisord.conf
[unix_http_server]
file = /var/run/supervisor.sock
chmod = 0770
chown = root:postgres
[rpcinterface:supervisor]
supervisor.rpcinterface_factory = supervisor.rpcinterface:make_main_rpcinterface
[supervisorctl]
serverurl = unix:///var/run/supervisor.sock
[supervisord]
logfile = /var/log/supervisor/supervisor.log
childlogdir = /var/log/supervisor
pidfile = /var/run/supervisord.pid
nodaemon = true
[program:sshd]
command = /usr/sbin/sshd -D -e
stdout_logfile = /var/log/supervisor/sshd-stdout.log
stderr_logfile = /var/log/supervisor/sshd-stderr.log
[program:postgres]
command = /docker-entrypoint.sh postgres -c config_file=/etc/postgresql/10/main/postgresql.conf
stdout_logfile = /var/log/supervisor/postgres-stdout.log
stderr_logfile = /var/log/supervisor/postgres-stderr.log
[program:repmgrd]
command = bash -c "sleep 10 && /usr/local/bin/repmgr_helper.sh"
user = postgres
stdout_logfile = /var/log/supervisor/repmgr-stdout.log
stderr_logfile = /var/log/supervisor/repmgr-stderr.log
[group:jm]
programs = sshd, postgres, repmgrd
repmgr_helper.sh
比/usr/lib/postgresql/10/bin/repmgrd --verbose
多一点。
repmgr.conf
node_id=1
node_name='pg-dock-1'
conninfo='host=pg-dock-1 port=5432 user=repmgr dbname=repmgr connect_timeout=60'
data_directory='/var/lib/postgresql/data/'
use_replication_slots=1
pg_bindir='/usr/lib/postgresql/10/bin/'
failover='automatic'
promote_command='/usr/bin/repmgr standby promote --log-to-file'
follow_command='/usr/bin/repmgr standby follow --log-to-file -W --upstream-node-id=%n'
ps
输出
root@9f39cb085506:/# ps -ef
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 11:54 ? 00:00:00 bash /usr/local/bin/pg-docker-entrypoint.sh
root 10 1 0 11:54 ? 00:00:01 /usr/bin/python /usr/bin/supervisord -c /etc/supervisor/supervisord.conf
root 13 10 0 11:54 ? 00:00:00 /usr/sbin/sshd -D -e
postgres 15 10 0 11:54 ? 00:00:07 postgres -c config_file=/etc/postgresql/10/main/postgresql.conf
postgres 36 15 0 11:54 ? 00:00:00 postgres: checkpointer process
postgres 37 15 0 11:54 ? 00:00:00 postgres: writer process
postgres 38 15 0 11:54 ? 00:00:00 postgres: wal writer process
postgres 39 15 0 11:54 ? 00:00:00 postgres: autovacuum launcher process
postgres 40 15 0 11:54 ? 00:00:00 postgres: archiver process
postgres 41 15 0 11:54 ? 00:00:01 postgres: stats collector process
postgres 42 15 0 11:54 ? 00:00:00 postgres: bgworker: logical replication launcher
postgres 51 15 0 11:54 ? 00:00:00 postgres: wal sender process repmgr 10.0.14.4(33812) streaming 0/4002110
postgres 55 15 0 11:54 ? 00:00:00 postgres: repmgr repmgr 10.0.14.4(33824) idle
postgres 88 15 0 11:54 ? 00:00:01 postgres: repmgr repmgr 10.0.14.5(33496) idle
postgres 90 1 0 11:54 ? 00:00:03 /usr/lib/postgresql/10/bin/repmgrd --verbose
root 107 0 0 11:54 pts/0 00:00:00 bash
root 9323 107 0 12:50 pts/0 00:00:00 ps -ef
如您所见,repmgrd
进程现在是入口点的 child,而不是 supervisor
(如 sshd
和 postgres
)。我试过直接启动命令(没有 "helper"),我试过使用 bash -c
,我试过将 /usr/bin/repmgrd
指定为可执行文件,但无论我在end 我总是得出这个结果。
那么我的问题是 two-fold:为什么会发生这种情况,我该怎么做才能使 repmgrd
过程处于 supervisor
的控制之下。
编辑:按照建议,我在启动 repmgrd 时尝试使用 --daemonize=false
。
这种帮助,但不完全。查看输出:
root@6ab09e13f425:/# ps -ef
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 17:06 ? 00:00:00 bash /usr/local/bin/pg-docker-entrypoint.sh
root 11 1 2 17:06 ? 00:00:00 /usr/bin/python /usr/bin/supervisord -c /etc/supervisor/supervisord.conf
root 14 11 0 17:06 ? 00:00:00 /usr/sbin/sshd -D -e
postgres 15 11 0 17:06 ? 00:00:00 bash /usr/local/bin/repmgr_helper.sh
postgres 16 11 1 17:06 ? 00:00:00 postgres -c config_file=/etc/postgresql/10/main/postgresql.conf
postgres 37 16 0 17:06 ? 00:00:00 postgres: checkpointer process
postgres 38 16 0 17:06 ? 00:00:00 postgres: writer process
postgres 39 16 0 17:06 ? 00:00:00 postgres: wal writer process
postgres 40 16 0 17:06 ? 00:00:00 postgres: autovacuum launcher process
postgres 41 16 0 17:06 ? 00:00:00 postgres: archiver process
postgres 42 16 0 17:06 ? 00:00:00 postgres: stats collector process
postgres 43 16 0 17:06 ? 00:00:00 postgres: bgworker: logical replication launcher
postgres 44 16 0 17:06 ? 00:00:00 postgres: wal sender process repmgr 10.0.23.136(47132) streaming 0/4008E28
root 45 0 0 17:06 pts/0 00:00:00 bash
postgres 77 15 1 17:06 ? 00:00:00 /usr/lib/postgresql/10/bin/repmgrd --daemonize=false --verbose
postgres 78 16 0 17:06 ? 00:00:00 postgres: repmgr repmgr 10.0.23.136(47150) idle
postgres 79 16 0 17:06 ? 00:00:00 postgres: repmgr repmgr 10.0.23.134(43476) idle
root 86 45 0 17:06 pts/0 00:00:00 ps -ef
root@6ab09e13f425:/# supervisorctl stop jm:repmgrd
jm:repmgrd: stopped
root@6ab09e13f425:/# ps -ef
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 17:06 ? 00:00:00 bash /usr/local/bin/pg-docker-entrypoint.sh
root 11 1 1 17:06 ? 00:00:00 /usr/bin/python /usr/bin/supervisord -c /etc/supervisor/supervisord.conf
root 14 11 0 17:06 ? 00:00:00 /usr/sbin/sshd -D -e
postgres 16 11 0 17:06 ? 00:00:00 postgres -c config_file=/etc/postgresql/10/main/postgresql.conf
postgres 37 16 0 17:06 ? 00:00:00 postgres: checkpointer process
postgres 38 16 0 17:06 ? 00:00:00 postgres: writer process
postgres 39 16 0 17:06 ? 00:00:00 postgres: wal writer process
postgres 40 16 0 17:06 ? 00:00:00 postgres: autovacuum launcher process
postgres 41 16 0 17:06 ? 00:00:00 postgres: archiver process
postgres 42 16 0 17:06 ? 00:00:00 postgres: stats collector process
postgres 43 16 0 17:06 ? 00:00:00 postgres: bgworker: logical replication launcher
postgres 44 16 0 17:06 ? 00:00:00 postgres: wal sender process repmgr 10.0.23.136(47132) streaming 0/4008E60
root 45 0 0 17:06 pts/0 00:00:00 bash
postgres 77 1 0 17:06 ? 00:00:00 /usr/lib/postgresql/10/bin/repmgrd --daemonize=false --verbose
postgres 78 16 0 17:06 ? 00:00:00 postgres: repmgr repmgr 10.0.23.136(47150) idle
postgres 79 16 0 17:06 ? 00:00:00 postgres: repmgr repmgr 10.0.23.134(43476) idle
root 106 45 0 17:07 pts/0 00:00:00 ps -ef
启动时,进程保持 supervisor
,但停止它只会杀死 repmgr_helper.sh
,导致 "real" 进程保持活动状态并重新分配给 1
作为其 parent.
这并不理想,因为现在我遇到了一个奇怪的情况,进程处于活动状态,但 supervisor
认为它不是。因此发出 supervisorctl start jm:repmgrd
将失败说
[ERROR] PID file "/tmp/repmgrd.pid" exists and seems to contain a valid PID
[HINT] if repmgrd is no longer alive, remove the file and restart repmgrd
根据评论中的讨论更新了答案:
这些是当前解决方案的问题:
启动repmgrd的原命令:
command = bash -c "sleep 10 && /usr/local/bin/repmgr_helper.sh"
运行s bash,执行另一个 bash 脚本(即 bash 的另一个实例),然后 运行s repmgrd,这些进程太多,大部分都不需要
supervisord 希望调用的命令保留在前台,但默认情况下 repmgrd 自身守护进程
在排除故障时,repmgrd 生成的 pid 文件存在一些问题
可以通过以下更改解决这些问题:
要改用的命令:
command = /usr/local/bin/repmgr_helper.sh
/usr/local/bin/repmgr_helper.sh
第一步需要更新为运行sleep 10
/usr/local/bin/repmgr_helper.sh
作为最后一步应按以下方式调用 repmgrd:exec /path/to/repmgrd --daemonize=false --no-pid-file
所以,一个。由于
exec
它替换了它启动它的脚本 b。它不会自行守护进程 c。它不会生成 pid 文件。
原始答案(更新前)
在启动命令中尝试将 --daemonize=false
传递给 repmgrd。