OS 升级到 RHEL 7.6 后 GitLab 遇到 PostgreSQL 问题

Question

我们最近升级了OS：

$ cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.6 (Maipo)

升级后，我们在 GitLab（主要是 Postgres）方面面临很多问题。

我们的 GitLab 是码头化的，即 GitLab（及其所有内部服务，包括 PostgreSQL）是运行在一个容器中。该容器没有自己的 glibc，因此它使用 OS.

中的容器

ERROR: canceling statement due to statement timeout

STATEMENT:
SELECT relnamespace::regnamespace as schemaname, relname as relname, pg_total_relation_size(oid) bytes FROM pg_class WHERE relkind = 'r';

超时消息不断出现，这导致用户在访问 GitLab 时遇到 502 错误。

我检查了数据库上设置的语句超时。

gitlabhq_production=# show statement_timeout;
 statement_timeout
-------------------
 1min
(1 row)

我不知道该怎么做。这可能是默认设置。这是postgres的问题吗？这是什么意思？我能做些什么来解决这个问题？

编辑：

已检查 pg_stat_activity 并且没有看到任何锁定，因为服务器之前已重新启动。相同的查询现在运行没问题，但我们一直间歇性地看到这个问题。

运行 \d pg_class 检查 table 是否使用任何索引并检查字符串列。

gitlabhq_production=# \d pg_class
         Table "pg_catalog.pg_class"
       Column        |   Type    | Modifiers
---------------------+-----------+-----------
 relname             | name      | not null
 relnamespace        | oid       | not null
 reltype             | oid       | not null
 reloftype           | oid       | not null
 relowner            | oid       | not null
 relam               | oid       | not null
 relfilenode         | oid       | not null
 reltablespace       | oid       | not null
 relpages            | integer   | not null
 reltuples           | real      | not null
 relallvisible       | integer   | not null
 reltoastrelid       | oid       | not null
 relhasindex         | boolean   | not null
 relisshared         | boolean   | not null
 relpersistence      | "char"    | not null
 relkind             | "char"    | not null
 relnatts            | smallint  | not null
 relchecks           | smallint  | not null
 relhasoids          | boolean   | not null
 relhaspkey          | boolean   | not null
 relhasrules         | boolean   | not null
 relhastriggers      | boolean   | not null
 relhassubclass      | boolean   | not null
 relrowsecurity      | boolean   | not null
 relforcerowsecurity | boolean   | not null
 relispopulated      | boolean   | not null
 relreplident        | "char"    | not null
 relfrozenxid        | xid       | not null
 relminmxid          | xid       | not null
 relacl              | aclitem[] |
 reloptions          | text[]    |
Indexes:
    "pg_class_oid_index" UNIQUE, btree (oid)
    "pg_class_relname_nsp_index" UNIQUE, btree (relname, relnamespace)
    "pg_class_tblspc_relfilenode_index" btree (reltablespace, relfilenode)

重新索引所有 table 和可能 alter table 会有帮助吗？

Answer 1

您应该检查一下查询 us 运行是否被数据库锁阻塞。这可以从后端的 pg_stat_activity 行看出，它将显示查询是否正在等待锁（state=active 和 wait_event_type 和 wait_event 表示锁).

如果是锁，去掉锁事务。它可能是一个准备好的交易，所以也要检查这些。

如果没有锁定错误，可能是您的索引已被操作系统升级损坏：

由于 PostgreSQL 使用操作系统归类，字符串上的数据库索引按归类顺序排序，并且操作系统升级可能（并且经常）导致归类因 C 库中的错误修复而发生更改，因此您应该重建所有索引在此类升级后的字符串列上。

您显示的语句没有使用索引扫描，因此应该不会受到影响，但其他语句可能会受到影响。

另外，如果你使用的是Docker，可能是你的容器使用了自己的glibc，没有升级，那么你不受影响。

OS 升级到 RHEL 7.6 后 GitLab 遇到 PostgreSQL 问题

GitLab encountering issues with PostgreSQL after OS upgrade to RHEL 7.6

postgresql

gitlab

gitlab-ce