启动 Loki pod 时出现授权错误

Authorization errors when starting a Loki pod

今天早上我发现 Loki 在 EKS 集群中停止工作

在 loki pod 日志中,我看到以下内容:

level=error ts=2022-04-07T10:44:43.298418416Z caller=table_manager.go:233 msg="error syncing tables" err="WebIdentityErr: failed to retrieve credentials\ncaused by: RequestError: send request failed\ncaused by: Post \"https://sts.us-east-2.amazonaws.com/\": dial tcp XXX.XXX.XXX.XXX:443: i/o timeout"

示例值文件 - 通过 flux 进行部署:

apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
  name: loki
  namespace: loki
spec:
  values:
    extraArgs:
      target: all,table-manager
    serviceAccount: 
      create: true
      name: lokiaccess
      annotations: 
        eks.amazonaws.com/role-arn: arn:aws:iam::<PRIVATE_ID>:role/LokiAccess
        eks.amazonaws.com/sts-regional-endpoints: "true"
    config: 
      storage_config: 
        boltdb_shipper:
          shared_store: s3
        aws: 
          s3: s3://us-east-2/<PRIVATE_STORAGE>
          dynamodb: 
            dynamodb_url: dynamodb://us-east-2
      schema_config:
        configs:
          - from: "2022-04-04"
            store: aws
            object_store: s3
            schema: v11
            index:
              prefix: loki_
              period: 24h
 

因此,问题出在 AWS 的 IAM 角色中。 我更改了设置并添加了

                "arn:aws:dynamodb:us-east-2:${data.aws_caller_identity.current.account_id}:table/loki_/index/*",
            "arn:aws:dynamodb:us-east-2:${data.aws_caller_identity.current.account_id}:table/loki_",
            "arn:aws:s3:::${var.aws_s3_loki_storage}/*",
            "arn:aws:s3:::${var.aws_s3_loki_storage}"

到我们的 terraform 脚本