启动 Loki pod 时出现授权错误
Authorization errors when starting a Loki pod
今天早上我发现 Loki 在 EKS 集群中停止工作
在 loki pod 日志中,我看到以下内容:
level=error ts=2022-04-07T10:44:43.298418416Z caller=table_manager.go:233 msg="error syncing tables" err="WebIdentityErr: failed to retrieve credentials\ncaused by: RequestError: send request failed\ncaused by: Post \"https://sts.us-east-2.amazonaws.com/\": dial tcp XXX.XXX.XXX.XXX:443: i/o timeout"
- ServiceAccount 存在且未更改
- IAM 角色存在且未更改
- 与 Loki 和 prometheus 相关的仪表板停止工作(prometheus 与 thanos 一起工作)
示例值文件 - 通过 flux 进行部署:
apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
name: loki
namespace: loki
spec:
values:
extraArgs:
target: all,table-manager
serviceAccount:
create: true
name: lokiaccess
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::<PRIVATE_ID>:role/LokiAccess
eks.amazonaws.com/sts-regional-endpoints: "true"
config:
storage_config:
boltdb_shipper:
shared_store: s3
aws:
s3: s3://us-east-2/<PRIVATE_STORAGE>
dynamodb:
dynamodb_url: dynamodb://us-east-2
schema_config:
configs:
- from: "2022-04-04"
store: aws
object_store: s3
schema: v11
index:
prefix: loki_
period: 24h
因此,问题出在 AWS 的 IAM 角色中。
我更改了设置并添加了
"arn:aws:dynamodb:us-east-2:${data.aws_caller_identity.current.account_id}:table/loki_/index/*",
"arn:aws:dynamodb:us-east-2:${data.aws_caller_identity.current.account_id}:table/loki_",
"arn:aws:s3:::${var.aws_s3_loki_storage}/*",
"arn:aws:s3:::${var.aws_s3_loki_storage}"
到我们的 terraform 脚本
今天早上我发现 Loki 在 EKS 集群中停止工作
在 loki pod 日志中,我看到以下内容:
level=error ts=2022-04-07T10:44:43.298418416Z caller=table_manager.go:233 msg="error syncing tables" err="WebIdentityErr: failed to retrieve credentials\ncaused by: RequestError: send request failed\ncaused by: Post \"https://sts.us-east-2.amazonaws.com/\": dial tcp XXX.XXX.XXX.XXX:443: i/o timeout"
- ServiceAccount 存在且未更改
- IAM 角色存在且未更改
- 与 Loki 和 prometheus 相关的仪表板停止工作(prometheus 与 thanos 一起工作)
示例值文件 - 通过 flux 进行部署:
apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
name: loki
namespace: loki
spec:
values:
extraArgs:
target: all,table-manager
serviceAccount:
create: true
name: lokiaccess
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::<PRIVATE_ID>:role/LokiAccess
eks.amazonaws.com/sts-regional-endpoints: "true"
config:
storage_config:
boltdb_shipper:
shared_store: s3
aws:
s3: s3://us-east-2/<PRIVATE_STORAGE>
dynamodb:
dynamodb_url: dynamodb://us-east-2
schema_config:
configs:
- from: "2022-04-04"
store: aws
object_store: s3
schema: v11
index:
prefix: loki_
period: 24h
因此,问题出在 AWS 的 IAM 角色中。 我更改了设置并添加了
"arn:aws:dynamodb:us-east-2:${data.aws_caller_identity.current.account_id}:table/loki_/index/*",
"arn:aws:dynamodb:us-east-2:${data.aws_caller_identity.current.account_id}:table/loki_",
"arn:aws:s3:::${var.aws_s3_loki_storage}/*",
"arn:aws:s3:::${var.aws_s3_loki_storage}"
到我们的 terraform 脚本