Prometheus UI总是returns1甚至blackbox_exporterreturns0手动
Prometheus UI always returns 1 even blackbox_exporter returns 0 manually
我安装了 Prometheus 和 blackbox exporter。这是配置。
root@monitor-1:~# cat /etc/prometheus/prometheus.yml
global:
scrape_interval: 15s # By default, scrape targets every 15 seconds.
# Attach these labels to any time series or alerts when communicating with
# external systems (federation, remote storage, Alertmanager).
external_labels:
monitor: 'codelab-monitor'
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'prometheus'
# Override the global default and scrape targets from this job every 5 seconds.
scrape_interval: 5s
static_configs:
- targets: ['localhost:9090']
- job_name: 'blackbox'
metrics_path: /probe
params:
module: [http_2xx]
scrape_interval: 5s
static_configs:
- targets:
- http://wiki.itsmwork.com
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 192.168.20.202:9115
root@monitor-1:~# cat /etc/prometheus/blackbox.yaml | more
modules:
http_2xx:
prober: http
timeout: 5s
http:
preferred_ip_protocol: "ip4"
no_follow_redirects: false
fail_if_ssl: false
tls_config:
insecure_skip_verify: true
我手动检查了 http 站点,它返回了预期的 0。
root@monitor-1:~# curl "http://localhost:9115/probe?target=wiki.itsmwork.com&module=http_2xx" | grep -v '^#'
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 2013 100 2013 0 0 294k 0 --:--:-- --:--:-- --:--:-- 327k
probe_dns_lookup_time_seconds 0.002698265
probe_duration_seconds 0.00308218
probe_failed_due_to_regex 0
probe_http_content_length 0
probe_http_duration_seconds{phase="connect"} 0
probe_http_duration_seconds{phase="processing"} 0
probe_http_duration_seconds{phase="resolve"} 0
probe_http_duration_seconds{phase="tls"} 0
probe_http_duration_seconds{phase="transfer"} 0
probe_http_redirects 0
probe_http_ssl 0
probe_http_status_code 0
probe_http_uncompressed_body_length 0
probe_http_version 0
probe_ip_addr_hash 0
probe_ip_protocol 0
probe_success 0
但是如果我在 Prometheus UI 中检查相同的目标,up{instance="http://wiki.itsmwork.com",job="blackbox"} 总是 1。
如何确定问题所在?
在处理 blackbox exporter 时注意不要混淆 up
和 probe_success
。第一个指标表示导出器本身是可达的,后一个指标是关于 黑盒导出器自己抓取的目标 。所以你得到的组合是:
- 黑盒导出器工作正常
- 当从黑盒导出器探测时,要监视的系统没有按预期响应
这也符合您的手动测试:对 blackbox_exporter 实例的请求(您的 curl 命令)有效但会导致探测失败(如负载中所示)。因此,对于您的仪表板,如果您想推断被探测的系统,您应该始终将 up
指标与 probe_success
结合使用,因为也可能存在您要监视的系统是 运行 正确,但黑盒导出器作业不是。您可以使用 up
指标切换到 0
.
来发现这一点
我安装了 Prometheus 和 blackbox exporter。这是配置。
root@monitor-1:~# cat /etc/prometheus/prometheus.yml
global:
scrape_interval: 15s # By default, scrape targets every 15 seconds.
# Attach these labels to any time series or alerts when communicating with
# external systems (federation, remote storage, Alertmanager).
external_labels:
monitor: 'codelab-monitor'
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'prometheus'
# Override the global default and scrape targets from this job every 5 seconds.
scrape_interval: 5s
static_configs:
- targets: ['localhost:9090']
- job_name: 'blackbox'
metrics_path: /probe
params:
module: [http_2xx]
scrape_interval: 5s
static_configs:
- targets:
- http://wiki.itsmwork.com
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 192.168.20.202:9115
root@monitor-1:~# cat /etc/prometheus/blackbox.yaml | more
modules:
http_2xx:
prober: http
timeout: 5s
http:
preferred_ip_protocol: "ip4"
no_follow_redirects: false
fail_if_ssl: false
tls_config:
insecure_skip_verify: true
我手动检查了 http 站点,它返回了预期的 0。
root@monitor-1:~# curl "http://localhost:9115/probe?target=wiki.itsmwork.com&module=http_2xx" | grep -v '^#'
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 2013 100 2013 0 0 294k 0 --:--:-- --:--:-- --:--:-- 327k
probe_dns_lookup_time_seconds 0.002698265
probe_duration_seconds 0.00308218
probe_failed_due_to_regex 0
probe_http_content_length 0
probe_http_duration_seconds{phase="connect"} 0
probe_http_duration_seconds{phase="processing"} 0
probe_http_duration_seconds{phase="resolve"} 0
probe_http_duration_seconds{phase="tls"} 0
probe_http_duration_seconds{phase="transfer"} 0
probe_http_redirects 0
probe_http_ssl 0
probe_http_status_code 0
probe_http_uncompressed_body_length 0
probe_http_version 0
probe_ip_addr_hash 0
probe_ip_protocol 0
probe_success 0
但是如果我在 Prometheus UI 中检查相同的目标,up{instance="http://wiki.itsmwork.com",job="blackbox"} 总是 1。
如何确定问题所在?
在处理 blackbox exporter 时注意不要混淆 up
和 probe_success
。第一个指标表示导出器本身是可达的,后一个指标是关于 黑盒导出器自己抓取的目标 。所以你得到的组合是:
- 黑盒导出器工作正常
- 当从黑盒导出器探测时,要监视的系统没有按预期响应
这也符合您的手动测试:对 blackbox_exporter 实例的请求(您的 curl 命令)有效但会导致探测失败(如负载中所示)。因此,对于您的仪表板,如果您想推断被探测的系统,您应该始终将 up
指标与 probe_success
结合使用,因为也可能存在您要监视的系统是 运行 正确,但黑盒导出器作业不是。您可以使用 up
指标切换到 0
.