主检查点记录中的资源管理器 ID 无效
Invalid resource manager ID in primary checkpoint record
我已将我的 Airbyte 图片从 0.35.2-alpha
更新为 0.35.37-alpha
。
[运行 在 kubernetes 中]
当系统推出时,db pod 不会终止,我 [一个可怕的错误] 删除了 pod。
当它恢复时,我得到一个错误 -
PostgreSQL Database directory appears to contain a database; Skipping initialization
2022-02-24 20:19:44.065 UTC [1] LOG: starting PostgreSQL 13.6 on x86_64-pc-linux-musl, compiled by gcc (Alpine 10.3.1_git20211027) 10.3.1 20211027, 64-bit
2022-02-24 20:19:44.065 UTC [1] LOG: listening on IPv4 address "0.0.0.0", port 5432
2022-02-24 20:19:44.065 UTC [1] LOG: listening on IPv6 address "::", port 5432
2022-02-24 20:19:44.071 UTC [1] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2022-02-24 20:19:44.079 UTC [21] LOG: database system was shut down at 2022-02-24 20:12:55 UTC
2022-02-24 20:19:44.079 UTC [21] LOG: invalid resource manager ID in primary checkpoint record
2022-02-24 20:19:44.079 UTC [21] PANIC: could not locate a valid checkpoint record
2022-02-24 20:19:44.530 UTC [1] LOG: startup process (PID 21) was terminated by signal 6: Aborted
2022-02-24 20:19:44.530 UTC [1] LOG: aborting startup due to startup process failure
2022-02-24 20:19:44.566 UTC [1] LOG: database system is shut down
很确定 WAL 文件已损坏,但我不确定如何修复此问题。
警告 - 可能会丢失数据
这是一个测试系统,所以我不关心保留最新的交易,也没有备份。
首先,我覆盖了容器命令以保留容器 运行 但没有尝试启动 postgres。
...
spec:
containers:
- name: airbyte-db-container
image: airbyte/db
command: ["sh"]
args: ["-c", "while true; do echo $(date -u) >> /tmp/run.log; sleep 5; done"]
...
并在 pod 上生成一个 shell -
kubectl exec -it -n airbyte airbyte-db-xxxx -- sh
运行 pg_reset_wal
# dry-run first
pg_resetwal --dry-run /var/lib/postgresql/data/pgdata
成功!
pg_resetwal /var/lib/postgresql/data/pgdata
Write-ahead log reset
然后去掉容器中的temp命令,postgres正常启动!
我已将我的 Airbyte 图片从 0.35.2-alpha
更新为 0.35.37-alpha
。
[运行 在 kubernetes 中]
当系统推出时,db pod 不会终止,我 [一个可怕的错误] 删除了 pod。 当它恢复时,我得到一个错误 -
PostgreSQL Database directory appears to contain a database; Skipping initialization
2022-02-24 20:19:44.065 UTC [1] LOG: starting PostgreSQL 13.6 on x86_64-pc-linux-musl, compiled by gcc (Alpine 10.3.1_git20211027) 10.3.1 20211027, 64-bit
2022-02-24 20:19:44.065 UTC [1] LOG: listening on IPv4 address "0.0.0.0", port 5432
2022-02-24 20:19:44.065 UTC [1] LOG: listening on IPv6 address "::", port 5432
2022-02-24 20:19:44.071 UTC [1] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2022-02-24 20:19:44.079 UTC [21] LOG: database system was shut down at 2022-02-24 20:12:55 UTC
2022-02-24 20:19:44.079 UTC [21] LOG: invalid resource manager ID in primary checkpoint record
2022-02-24 20:19:44.079 UTC [21] PANIC: could not locate a valid checkpoint record
2022-02-24 20:19:44.530 UTC [1] LOG: startup process (PID 21) was terminated by signal 6: Aborted
2022-02-24 20:19:44.530 UTC [1] LOG: aborting startup due to startup process failure
2022-02-24 20:19:44.566 UTC [1] LOG: database system is shut down
很确定 WAL 文件已损坏,但我不确定如何修复此问题。
警告 - 可能会丢失数据
这是一个测试系统,所以我不关心保留最新的交易,也没有备份。
首先,我覆盖了容器命令以保留容器 运行 但没有尝试启动 postgres。
...
spec:
containers:
- name: airbyte-db-container
image: airbyte/db
command: ["sh"]
args: ["-c", "while true; do echo $(date -u) >> /tmp/run.log; sleep 5; done"]
...
并在 pod 上生成一个 shell -
kubectl exec -it -n airbyte airbyte-db-xxxx -- sh
运行 pg_reset_wal
# dry-run first
pg_resetwal --dry-run /var/lib/postgresql/data/pgdata
成功!
pg_resetwal /var/lib/postgresql/data/pgdata
Write-ahead log reset
然后去掉容器中的temp命令,postgres正常启动!