kubernetes timescaledb statefulset：更改在 pod 重新创建时丢失

Question

我有一个 Timescaledb 服务器运行作为 AKS 中的 StatefulSet。当我删除并重新创建 timescaledb pod 时，即使 pod 关联到最初关联的 PV（持久卷），更改也会丢失。感谢任何帮助。

下面是运行kubectl get statefulset timescaledb -o yaml

提取的statefulset的PV,PVC配置

  template:
    metadata:
      creationTimestamp: null
      labels:
        app: timescaledb
    spec:
      containers:
      - args:
        - -c
        - config_file=/etc/postgresql/postgresql.conf
        env:
        - name: POSTGRES_PASSWORD
          valueFrom:
            secretKeyRef:
              key: password
              name: timescaledb-secret
        image: docker.io/timescale/timescaledb:latest-pg9.6
        name: timescaledb-backend
        ports:
        - containerPort: 5432
          name: server
          protocol: TCP
        resources:
          requests:
            cpu: "3"
            memory: 6Gi
        volumeMounts:
        - mountPath: /var/lib/postgresql
          name: timescaledbdata
        - mountPath: /etc/postgresql
          name: timescaledb-config
      volumes:
      - configMap:
          defaultMode: 420
          name: timescaledb-config
        name: timescaledb-config
  volumeClaimTemplates:
  - metadata:
      annotations:
        volume.alpha.kubernetes.io/storage-class: standard
      creationTimestamp: null
      name: timescaledbdata
    spec:
      accessModes:
      - ReadWriteOnce
      dataSource: null
      resources:
        requests:
          storage: 200Gi
    status:
      phase: Pending

下面演示创建的临时数据库 test_db 在 pod 重建后丢失，并且在整个过程中，pod 关联到 Azure 上的相同 PV/disk。

root@e70a91715239:~/keys# k get pvc -l app=timescaledb
NAME                            STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
timescaledbdata-timescaledb-0   Bound    pvc-c7eb99cf-6a6b-11e9-b661-be660567cc75   200Gi      RWO            default        83d

root@e70a91715239:~/keys# k exec -ti timescaledb-0 bash
bash-4.4# psql -U postgres;
psql (9.6.13)
Type "help" for help.

postgres=# create database test_db;
CREATE DATABASE
postgres=# \l
                                 List of databases
   Name    |  Owner   | Encoding |  Collate   |   Ctype    |   Access privileges   
-----------+----------+----------+------------+------------+-----------------------
 postgres  | postgres | UTF8     | en_US.utf8 | en_US.utf8 | 
 template0 | postgres | UTF8     | en_US.utf8 | en_US.utf8 | =c/postgres          +
           |          |          |            |            | postgres=CTc/postgres
 template1 | postgres | UTF8     | en_US.utf8 | en_US.utf8 | =c/postgres          +
           |          |          |            |            | postgres=CTc/postgres
 test_db   | postgres | UTF8     | en_US.utf8 | en_US.utf8 | 
(4 rows)


root@e70a91715239:~/keys# k get pods | grep timescale
timescaledb-0                         1/1     Running   0          12m
root@e70a91715239:~/keys# k delete pod/timescaledb-0                            
pod "timescaledb-0" deleted                                                                                                                                         
root@e70a91715239:~/keys# k get pods | grep timescale       
timescaledb-0                         1/1     Running   0          14s   

root@e70a91715239:~/keys# k exec -ti timescaledb-0 bash                                                                                                             
bash-4.4# psql -U postgres
psql (9.6.13)
Type "help" for help.

postgres=# \l
                                 List of databases
   Name    |  Owner   | Encoding |  Collate   |   Ctype    |   Access privileges                                                                                    
-----------+----------+----------+------------+------------+-----------------------                                                                                 
 postgres  | postgres | UTF8     | en_US.utf8 | en_US.utf8 |
 template0 | postgres | UTF8     | en_US.utf8 | en_US.utf8 | =c/postgres          +                                                                                 
           |          |          |            |            | postgres=CTc/postgres                                                                                  
 template1 | postgres | UTF8     | en_US.utf8 | en_US.utf8 | =c/postgres          +                                                                                 
           |          |          |            |            | postgres=CTc/postgres                                                                                  
(3 rows)

root@e70a91715239:~/keys# k get pvc -l app=timescaledb
NAME                            STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
timescaledbdata-timescaledb-0   Bound    pvc-c7eb99cf-6a6b-11e9-b661-be660567cc75   200Gi      RWO            default        83d

可能它正在按照提示重新初始化。请参阅 logs。关于为什么会这样做的任何指示。

更新 1：我查看了 timescale pod 中的挂载，/var/lib/postgresql 和 /var/lib/postgresql/data 似乎有不同的分区。我不明白为什么。

Filesystem                Size      Used Available Use% Mounted on
overlay                  96.9G     22.1G     74.8G  23% /
tmpfs                    64.0M         0     64.0M   0% /dev
tmpfs                     7.8G         0      7.8G   0% /sys/fs/cgroup
/dev/sda1                96.9G     22.1G     74.8G  23% /docker-entrypoint-initdb.d
/dev/sda1                96.9G     22.1G     74.8G  23% /dev/termination-log
shm                      64.0M      4.0K     64.0M   0% /dev/shm
/dev/sda1                96.9G     22.1G     74.8G  23% /etc/resolv.conf
/dev/sda1                96.9G     22.1G     74.8G  23% /etc/hostname
/dev/sda1                96.9G     22.1G     74.8G  23% /etc/hosts
/dev/sdc                196.7G     59.3M    196.7G   0% /var/lib/postgresql
/dev/sda1                96.9G     22.1G     74.8G  23% /var/lib/postgresql/data

不明白上面的挂载是如何在下面的配置中发生的

        volumeMounts:
        - mountPath: /var/lib/postgresql
          name: timescaledbdata
        - mountPath: /etc/postgresql
          name: timescaledb-config

Answer 1

问题是 postgres:9.6 Dockerfile 中有 /var/lib/postgresql/data 的 VOLUME 声明，这导致容器上有额外的挂载。当我们在 /var/lib/postgresql 处安装卷时，该安装是短暂的。但是我们无法将 AKS 卷挂载到 /var/lib/postgresql/data，因为该卷带有 lost+found 子目录，而 Postgres 需要空目录来存储数据库文件。

修复方法是在 /var/lib/postgresql/data 上安装卷并告诉 Postgres 使用 /var/lib/postgresql/data 下的子目录来存储具有 PGDATA 环境变量的文件。

以下是k8s statefulset配置中修复的相关部分

env:
- name: PGDATA
  value: "/var/lib/postgresql/data/dbfiles"        
...
volumeMounts:
- mountPath: /var/lib/postgresql/data
  name: timescaledata

kubernetes timescaledb statefulset：更改在 pod 重新创建时丢失

kubernetes timescaledb statefulset: Changes lost on pod recreation

postgresql

azure

persistent-volumes

timescaledb

azure-aks