如何监控从站的异步流复制延迟?

How to monitor async streaming replica delay from the slave?

我们有一个带有 PostgreSQL 12.x 的系统,其中所有更改都被写入主数据库服务器,并且两个只读流式异步副本用于减少主服务器的只读事务负载处理轻微延迟。

因为在某些情况下异步副本可能会从主服务器延迟,我们需要一种方法来查询复制的延迟(延迟)。我们不想联系主服务器来执行此操作,因此一种明显的方法是从副本服务器查询延迟:

select
(extract(epoch from now()) - extract(epoch from last_msg_send_time)) * 1000 as delay_ms
from pg_stat_wal_receiver;

但是,pg_stat_wal_receiver好像没有我们slave机器的数据。它确实有一行,但只有 pid 列有数据,其他每一列都是空的。该文档不清楚详细信息,但可能是 pg_stat_wal_receiver 仅具有 sync 流副本的数据?

有没有办法计算出 async 副本的流延迟?我希望这只是某种配置错误,而不是“不支持”。

所有服务器机器都是 运行 PostgreSQL 12.2,但客户端机器仍然是 运行 PostgreSQL 9.5 客户端库,以防有所不同。

我不明白为什么 table pg_stat_wal_receiver 没有数据,但这里有 解决缺少延迟数据的方法 :

select now() - pg_last_xact_replay_timestamp() as replication_lag;

或者如果您希望延迟以毫秒为单位(纯数字):

select round(extract(epoch from (now() - pg_last_xact_replay_timestamp())*1000)) as replication_lag_ms;

请注意,这使用函数 pg_last_xact_replay_timestamp()(强调我的):

Get time stamp of last transaction replayed during recovery. This is the time at which the commit or abort WAL record for that transaction was generated on the primary. If no transactions have been replayed during recovery, this function returns NULL. Otherwise, if recovery is still in progress this will increase monotonically. If recovery has completed then this value will remain static at the value of the last transaction applied during that recovery. When the server has been started normally without recovery the function returns NULL.

然而,似乎异步流复制确实在系统具有正常负载时(在主服务器上主动写入)连续增加此时间戳。如果 master 没有更改但流复制处于活动状态,则此时间戳是否停止增加仍不清楚。

我想我可以回答有关 pg_stat_wal_receiver 的缺失列的问题。要阅读其余专栏,您需要以超级用户或被授予 pg_read_all_stats privilege/role.

的登录角色登录

此行为记录在 walreceiver.c 的源代码中,在 pg_stat_get_wal_receiver 的实现中,说:

...
/*
 * Only superusers and members of pg_read_all_stats can see details.
 * Other users only get the pid value to know whether it is a WAL
 * receiver, but no details.
 */
...