在 mesosphere DCOS 集群上为 prometheus 设置 cloud watch exporter

Setting up cloud watch exporter for prometheus on mesosphere DCOS cluster

我已经在我的 AWS mesosphere DCOS 集群 上为 Prometheus 设置了 cloud watch exporter。我启用了“CloudWatchFullAccess”策略。但是 meter, 'cloudwatch_exporter_scrape_error' 显示非零值。我想知道为什么刮擦会出错。

在哪里可以查看日志或如何调试这个问题?

另外我使用的配置文件是

{
   "region": "ap-southeast-1",
   "metrics": [
        {"aws_namespace": "AWS/ELB", "aws_metric_name": "HealthyHostCount",
         "aws_dimensions": ["AvailabilityZone", "LoadBalancerName"],
         "aws_dimension_select": {"LoadBalancerName": ["name of my LB"]},
         "aws_statistics": ["Sum"]
        }
      ]
}

但是除了米我什么米都没有-cloudwatch_requests_total, cloudwatch_exporter_scrape_duration_secondscloudwatch_exporter_scrape_error 暴露于 Prometheus。

如何从 cloudwatch_exporter 获得额外的仪表?

您似乎正在尝试使用 IAM 实例配置文件,但您无法访问 http://169.254.169.254。这是您的网络设置的某种形式的问题,因为这应该在 EC2 上开箱即用。

你有两个选择。

  1. 您可以修复网络设置,以便再次访问 169.254.169.254。
  2. 或者您可以创建一个具有 cloudwatch:ListMetricscloudwatch:GetMetricStatistics IAM 权限的 IAM 用户,并生成访问密钥并将它们放入 AWS_ACCESS_KEY_IDAWS_SECRET_ACCESS_KEY~/.aws/credentials.

https://github.com/prometheus/cloudwatch_exporter#credentials-and-permissions

下面是我的配置文件

{
    "region": "us-west-2",
    "metrics": [
    {"aws_namespace": "AWS/ELB", "aws_metric_name": "HealthyHostCount",
     "aws_dimensions": ["us-west-2a", "test"], "aws_statistics": ["Average"]},
    {"aws_namespace": "AWS/ELB", "aws_metric_name": "UnHealthyHostCount",
     "aws_dimensions": ["us-west-2a", "test"], "aws_statistics": ["Average"]},
    {"aws_namespace": "AWS/ELB", "aws_metric_name": "RequestCount",
     "aws_dimensions": ["us-west-2a", "test"], "aws_statistics": ["Sum"]},
    {"aws_namespace": "AWS/ELB", "aws_metric_name": "Latency",
     "aws_dimensions": ["us-west-2a", "test"], "aws_statistics": ["Average"]},
    {"aws_namespace": "AWS/ELB", "aws_metric_name": "SurgeQueueLength",
     "aws_dimensions": ["us-west-2a", "test"], "aws_statistics": ["Maximum",    "Sum"]},
    ]
}

我可以看到下面的输出

cloudwatch_requests_total 10.0

cloudwatch_exporter_scrape_duration_seconds 2.571412647

cloudwatch_exporter_scrape_error 0.0

但为什么不是其他指标呢?