使用 Velero 进行 Azure AKS 备份

Azure AKS backup using Velero

我注意到 Velero 只能备份 AKS PVC,前提是这些 PVC 是磁盘而不是 Azure 文件共享。为了处理这个问题,我尝试使用 restic 通过文件共享本身进行备份,但我给了我一个奇怪的日志:

这就是我的实际 pod 的样子

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  annotations:
    backup.velero.io/backup-volumes: grafana-data
    deployment.kubernetes.io/revision: "17"

还有我备份的日志:

time="2020-05-26T13:51:54Z" level=info msg="Adding pvc grafana-data to additionalItems" backup=velero/grafana-test-volume cmd=/velero logSource="pkg/backup/pod_action.go:67" pluginName=velero
time="2020-05-26T13:51:54Z" level=info msg="Backing up item" backup=velero/grafana-test-volume group=v1 logSource="pkg/backup/item_backupper.go:169" name=grafana-data namespace=grafana resource=persistentvolumeclaims
time="2020-05-26T13:51:54Z" level=info msg="Executing custom action" backup=velero/grafana-test-volume group=v1 logSource="pkg/backup/item_backupper.go:330" name=grafana-data namespace=grafana resource=persistentvolumeclaims
time="2020-05-26T13:51:54Z" level=info msg="Skipping item because it's already been backed up." backup=velero/grafana-test-volume group=v1 logSource="pkg/backup/item_backupper.go:163" name=grafana-data namespace=grafana resource=persistentvolumeclaims

如您所见,它没有备份 grafana-data 卷,因为它说它已经在备份中(实际上不在)。

我的 azurefile 卷包含以下内容:

allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"allowVolumeExpansion":true,"apiVersion":"storage.k8s.io/v1beta1","kind":"StorageClass","metadata":{"annotations":{},"labels":{"kubernetes.io/cluster-service":"true"},"name":"azurefile"},"parameters":{"skuName":"Standard_LRS"},"provisioner":"kubernetes.io/azure-file"}
  creationTimestamp: "2020-05-18T15:18:18Z"
  labels:
    kubernetes.io/cluster-service: "true"
  name: azurefile
  resourceVersion: "1421202"
  selfLink: /apis/storage.k8s.io/v1/storageclasses/azurefile
  uid: e3cc4e52-c647-412a-bfad-81ab6eb222b1
mountOptions:
- nouser_xattr
parameters:
  skuName: Standard_LRS
provisioner: kubernetes.io/azure-file
reclaimPolicy: Delete
volumeBindingMode: Immediate

如您所见,我实际上修补了存储 class 以保存之前建议的 nouser_xattr 装载选项

当我检查 Restic pod 日志时,我看到以下信息:

E0524 10:22:08.908190       1 reflector.go:156] github.com/vmware-tanzu/velero/pkg/generated/informers/externalversions/factory.go:117: Failed to list *v1.PodVolumeBackup: Get https://10.0.0.1:443/apis/velero.io/v1/namespaces/velero/podvolumebackups?limit=500&resourceVersion=1212830: dial tcp 10.0.0.1:443: i/o timeout
I0524 10:22:08.909577       1 trace.go:116] Trace[1946538740]: "Reflector ListAndWatch" name:github.com/vmware-tanzu/velero/pkg/generated/informers/externalversions/factory.go:117 (started: 2020-05-24 10:21:38.908988405 +0000 UTC m=+487217.942875118) (total time: 30.000554209s):
Trace[1946538740]: [30.000554209s] [30.000554209s] END

当我检查 PodVolumeBackup pod 时,我看到以下内容。我不知道这里有什么期望

➜  ~ kubectl -n velero get podvolumebackups -o yaml              
apiVersion: v1
items: []
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

总结一下,我这样安装 Velero

velero install \
  --provider azure \
  --plugins velero/velero-plugin-for-microsoft-azure:v1.0.1 \
  --bucket $BLOB_CONTAINER \
  --secret-file ./credentials-velero \
  --backup-location-config resourceGroup=$AZURE_BACKUP_RESOURCE_GROUP,storageAccount=$AZURE_STORAGE_ACCOUNT_ID \
  --snapshot-location-config apiTimeout=5m,resourceGroup=$AZURE_BACKUP_RESOURCE_GROUP \
  --use-restic
  --wait

最终结果是下面描述的部署

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  annotations:
    backup.velero.io/backup-volumes: app-upload
    deployment.kubernetes.io/revision: "18"
  creationTimestamp: "2020-05-18T16:55:38Z"
  generation: 10
  labels:
    app: app
    velero.io/backup-name: mekompas-tenant-production-20200518020012
    velero.io/restore-name: mekompas-tenant-production-20200518020012-20200518185536
  name: app
  namespace: mekompas-tenant-production
  resourceVersion: "427893"
  selfLink: /apis/extensions/v1beta1/namespaces/mekompas-tenant-production/deployments/app
  uid: c1961ec3-b7b1-4f81-9aae-b609fa3d31fc
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: app
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      annotations:
        kubectl.kubernetes.io/restartedAt: "2020-05-18T20:24:19+02:00"
      creationTimestamp: null
      labels:
        app: app
    spec:
      containers:
      - image: nginx:1.17-alpine
        imagePullPolicy: IfNotPresent
        name: app-nginx
        ports:
        - containerPort: 80
          name: http
          protocol: TCP
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /var/www/html
          name: app-files
        - mountPath: /etc/nginx/conf.d
          name: nginx-vhost
      - env:
        - name: CONF_DB_HOST
          value: db.mekompas-tenant-production
        - name: CONF_DB
          value: mekompas
        - name: CONF_DB_USER
          value: mekompas
        - name: CONF_DB_PASS
          valueFrom:
            secretKeyRef:
              key: DATABASE_PASSWORD
              name: secret
        - name: CONF_EMAIL_FROM_ADDRESS
          value: noreply@mekompas.nl
        - name: CONF_EMAIL_FROM_NAME
          value: mekompas
        - name: CONF_EMAIL_REPLYTO_ADDRESS
          value: slc@mekompas.nl
        - name: CONF_UPLOAD_PATH
          value: /uploads
        - name: CONF_SMTP_HOST
          value: smtp.sendgrid.net
        - name: CONF_SMTP_PORT
          value: "587"
        - name: CONF_SMTP_USER
          value: apikey
        - name: CONF_SMTP_PASSWORD
          valueFrom:
            secretKeyRef:
              key: MAIL_PASSWORD
              name: secret
        image: me.azurecr.io/mekompas/php-fpm-alpine:1.12.0
        imagePullPolicy: Always
        lifecycle:
          postStart:
            exec:
              command:
              - /bin/sh
              - -c
              - cp -r /app/. /var/www/html && chmod -R 777 /var/www/html/templates_c
                && chmod -R 777 /var/www/html/core/lib/htmlpurifier-4.9.3/library/HTMLPurifier/DefinitionCache
        name: app-php
        ports:
        - containerPort: 9000
          name: upstream-php
          protocol: TCP
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /var/www/html
          name: app-files
        - mountPath: /uploads
          name: app-upload
      dnsPolicy: ClusterFirst
      imagePullSecrets:
      - name: registrypullsecret
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30
      volumes:
      - name: app-upload
        persistentVolumeClaim:
          claimName: upload
      - emptyDir: {}
        name: app-files
      - configMap:
          defaultMode: 420
          name: nginx-vhost
        name: nginx-vhost
status:
  availableReplicas: 1
  conditions:
  - lastTransitionTime: "2020-05-18T18:12:20Z"
    lastUpdateTime: "2020-05-18T18:12:20Z"
    message: Deployment has minimum availability.
    reason: MinimumReplicasAvailable
    status: "True"
    type: Available
  - lastTransitionTime: "2020-05-18T16:55:38Z"
    lastUpdateTime: "2020-05-20T16:03:48Z"
    message: ReplicaSet "app-688699c5fb" has successfully progressed.
    reason: NewReplicaSetAvailable
    status: "True"
    type: Progressing
  observedGeneration: 10
  readyReplicas: 1
  replicas: 1
  updatedReplicas: 1

最好的, 皮姆

您是否已将 nouser_xattr 添加到您的 StorageClass mountOptions 列表中?

此要求记录在 GitHub issue 1800 中。

也在 restic integration page 中提到(在 Azure 部分下查看),他们提供了这个片段来修补您的 StorageClass 资源:

kubectl patch storageclass/<YOUR_AZURE_FILE_STORAGE_CLASS_NAME> \
  --type json \
  --patch '[{"op":"add","path":"/mountOptions/-","value":"nouser_xattr"}]'

如果您没有现有的 mountOptions 列表,您可以尝试:

kubectl patch storageclass azurefile \
  --type merge \
  --patch '{"mountOptions": ["nouser_xattr"]}'

确保 Deployment 资源的 pod 模板包含注释 backup.velero.io/backup-volumesDeployment 资源上的注释将传播到 ReplicaSet 资源,但不会传播到 Pod 资源。

具体来说,在您的示例中,注释 backup.velero.io/backup-volumes: app-upload 应该是 spec.template.metadata.annotations 的子项,而不是 metadata.annotations 的子项。

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  annotations:
    # *** move velero annotiation from here ***
  labels:
    app: app
  name: app
  namespace: mekompas-tenant-production
spec:
  template:
    metadata:
      annotations:
        # *** velero annotation goes here in order to end up on the pod ***
        backup.velero.io/backup-volumes: app-upload
      labels:
        app: app
    spec:
      containers:
      - image: nginx:1.17-alpine