超出上下文截止日期的 Prometheus JMX 导出器

Prometheus JMX exporter with context deadline exceeded

我已成功启用 node_exporter 的监控,但 JMX_exporter 失败了

我能够通过 Curl 为 jmx_metrics 端点 (http://localhost:55555/testsvr2/jmx_exporter/metrics) 获得一个响应时间不到一秒的输出(我附上了下面的输出)但是 Prometheus 显示状态为"DOWN" 消息 "context deadline exceeded".

这是我用来监控服务器的 prometheus 配置。

global:
  scrape_interval: 15s

scrape_configs:
  - job_name: testsvr2_node
    scrape_interval: 5s
    metrics_path: /testsvr2/node_exporter/metrics
    static_configs:
      - targets: ['localhost:55555']
  - job_name: testsvr2_jmx
    scrape_interval: 20s
    metrics_path: /testsvr2/jmx_exporter/metrics
    static_configs:
      - targets: ['localhost:55555']

JMX 导出器 curl 输出:

# HELP jvm_buffer_pool_used_bytes Used bytes of a given JVM buffer pool.
# TYPE jvm_buffer_pool_used_bytes gauge
jvm_buffer_pool_used_bytes{pool="direct",} 246744.0
jvm_buffer_pool_used_bytes{pool="mapped",} 0.0
# HELP jvm_buffer_pool_capacity_bytes Bytes capacity of a given JVM buffer pool.
# TYPE jvm_buffer_pool_capacity_bytes gauge
jvm_buffer_pool_capacity_bytes{pool="direct",} 246744.0
jvm_buffer_pool_capacity_bytes{pool="mapped",} 0.0
# HELP jvm_buffer_pool_used_buffers Used buffers of a given JVM buffer pool.
# TYPE jvm_buffer_pool_used_buffers gauge
jvm_buffer_pool_used_buffers{pool="direct",} 30.0
jvm_buffer_pool_used_buffers{pool="mapped",} 0.0
# HELP jvm_memory_bytes_used Used bytes of a given JVM memory area.
# TYPE jvm_memory_bytes_used gauge
jvm_memory_bytes_used{area="heap",} 4.98246352E8
jvm_memory_bytes_used{area="nonheap",} 2.76580424E8
# HELP jvm_memory_bytes_committed Committed (bytes) of a given JVM memory area.
# TYPE jvm_memory_bytes_committed gauge
jvm_memory_bytes_committed{area="heap",} 6.33339904E8
jvm_memory_bytes_committed{area="nonheap",} 3.96230656E8
# HELP jvm_memory_bytes_max Max (bytes) of a given JVM memory area.
# TYPE jvm_memory_bytes_max gauge
jvm_memory_bytes_max{area="heap",} 3.817865216E9
jvm_memory_bytes_max{area="nonheap",} 1.124073472E9
# HELP jvm_memory_bytes_init Initial bytes of a given JVM memory area.
# TYPE jvm_memory_bytes_init gauge
jvm_memory_bytes_init{area="heap",} 2.59995072E8
jvm_memory_bytes_init{area="nonheap",} 2.4576E7
# HELP jvm_memory_pool_bytes_used Used bytes of a given JVM memory pool.
# TYPE jvm_memory_pool_bytes_used gauge
jvm_memory_pool_bytes_used{pool="Code Cache",} 2.1598784E7
jvm_memory_pool_bytes_used{pool="PS Eden Space",} 8.0618168E7
jvm_memory_pool_bytes_used{pool="PS Survivor Space",} 2097152.0
jvm_memory_pool_bytes_used{pool="PS Old Gen",} 4.15531032E8
jvm_memory_pool_bytes_used{pool="PS Perm Gen",} 2.5498164E8
# HELP jvm_memory_pool_bytes_committed Committed bytes of a given JVM memory pool.
# TYPE jvm_memory_pool_bytes_committed gauge
jvm_memory_pool_bytes_committed{pool="Code Cache",} 2.1889024E7
jvm_memory_pool_bytes_committed{pool="PS Eden Space",} 8.8604672E7
jvm_memory_pool_bytes_committed{pool="PS Survivor Space",} 2097152.0
jvm_memory_pool_bytes_committed{pool="PS Old Gen",} 5.4263808E8
jvm_memory_pool_bytes_committed{pool="PS Perm Gen",} 3.74341632E8
# HELP jvm_memory_pool_bytes_max Max bytes of a given JVM memory pool.
# TYPE jvm_memory_pool_bytes_max gauge
jvm_memory_pool_bytes_max{pool="Code Cache",} 5.0331648E7
jvm_memory_pool_bytes_max{pool="PS Eden Space",} 1.42606336E9
jvm_memory_pool_bytes_max{pool="PS Survivor Space",} 2097152.0
jvm_memory_pool_bytes_max{pool="PS Old Gen",} 2.863136768E9
jvm_memory_pool_bytes_max{pool="PS Perm Gen",} 1.073741824E9
# HELP jvm_memory_pool_bytes_init Initial bytes of a given JVM memory pool.
# TYPE jvm_memory_pool_bytes_init gauge
jvm_memory_pool_bytes_init{pool="Code Cache",} 2555904.0
jvm_memory_pool_bytes_init{pool="PS Eden Space",} 6.6060288E7
jvm_memory_pool_bytes_init{pool="PS Survivor Space",} 1.048576E7
jvm_memory_pool_bytes_init{pool="PS Old Gen",} 1.7301504E8
jvm_memory_pool_bytes_init{pool="PS Perm Gen",} 2.2020096E7
# HELP tomcat_errorcount_total Tomcat global errorCount
# TYPE tomcat_errorcount_total counter
tomcat_errorcount_total{port="8009",protocol="ajp-bio",} 0.0
tomcat_errorcount_total{port="8080",protocol="http-nio",} 792.0
# HELP tomcat_threadpool_connectioncount Tomcat threadpool connectionCount
# TYPE tomcat_threadpool_connectioncount gauge
tomcat_threadpool_connectioncount{port="8009",protocol="ajp-bio",} 1.0
tomcat_threadpool_connectioncount{port="8080",protocol="http-nio",} 1.0
# HELP tomcat_threadpool_pollerthreadcount Tomcat threadpool pollerThreadCount
# TYPE tomcat_threadpool_pollerthreadcount gauge
tomcat_threadpool_pollerthreadcount{port="8080",protocol="http-nio",} 2.0
# HELP tomcat_processingtime_total Tomcat global processingTime
# TYPE tomcat_processingtime_total counter
tomcat_processingtime_total{port="8009",protocol="ajp-bio",} 0.0
tomcat_processingtime_total{port="8080",protocol="http-nio",} 11878.0
# HELP tomcat_bytessent_total Tomcat global bytesSent
# TYPE tomcat_bytessent_total counter
tomcat_bytessent_total{port="8009",protocol="ajp-bio",} 0.0
tomcat_bytessent_total{port="8080",protocol="http-nio",} 8548511.0
# HELP tomcat_maxtime_total Tomcat global maxTime
# TYPE tomcat_maxtime_total counter
tomcat_maxtime_total{port="8009",protocol="ajp-bio",} 0.0
tomcat_maxtime_total{port="8080",protocol="http-nio",} 1583.0
# HELP tomcat_bytesreceived_total Tomcat global bytesReceived
# TYPE tomcat_bytesreceived_total counter
tomcat_bytesreceived_total{port="8009",protocol="ajp-bio",} 0.0
tomcat_bytesreceived_total{port="8080",protocol="http-nio",} 43847.0
# HELP tomcat_threadpool_currentthreadsbusy Tomcat threadpool currentThreadsBusy
# TYPE tomcat_threadpool_currentthreadsbusy gauge
tomcat_threadpool_currentthreadsbusy{port="8009",protocol="ajp-bio",} 0.0
tomcat_threadpool_currentthreadsbusy{port="8080",protocol="http-nio",} 0.0
# HELP tomcat_requestcount_total Tomcat global requestCount
# TYPE tomcat_requestcount_total counter
tomcat_requestcount_total{port="8009",protocol="ajp-bio",} 0.0
tomcat_requestcount_total{port="8080",protocol="http-nio",} 862.0
# HELP tomcat_threadpool_currentthreadcount Tomcat threadpool currentThreadCount
# TYPE tomcat_threadpool_currentthreadcount gauge
tomcat_threadpool_currentthreadcount{port="8009",protocol="ajp-bio",} 0.0
tomcat_threadpool_currentthreadcount{port="8080",protocol="http-nio",} 25.0
# HELP tomcat_threadpool_keepalivecount Tomcat threadpool keepAliveCount
# TYPE tomcat_threadpool_keepalivecount gauge
tomcat_threadpool_keepalivecount{port="8080",protocol="http-nio",} 0.0
# HELP jmx_scrape_duration_seconds Time this JMX scrape took, in seconds.
# TYPE jmx_scrape_duration_seconds gauge
jmx_scrape_duration_seconds 0.201767373
# HELP jmx_scrape_error Non-zero if this scrape failed.
# TYPE jmx_scrape_error gauge
jmx_scrape_error 0.0
# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
# TYPE process_cpu_seconds_total counter
process_cpu_seconds_total 329.21
# HELP process_start_time_seconds Start time of the process since unix epoch in seconds.
# TYPE process_start_time_seconds gauge
process_start_time_seconds 1.540210335811E9
# HELP process_open_fds Number of open file descriptors.
# TYPE process_open_fds gauge
process_open_fds 202.0
# HELP process_max_fds Maximum number of open file descriptors.
# TYPE process_max_fds gauge
process_max_fds 4096.0
# HELP process_virtual_memory_bytes Virtual memory size in bytes.
# TYPE process_virtual_memory_bytes gauge
process_virtual_memory_bytes 7.924580352E9
# HELP process_resident_memory_bytes Resident memory size in bytes.
# TYPE process_resident_memory_bytes gauge
process_resident_memory_bytes 9.93017856E8
# HELP jmx_config_reload_success_total Number of times configuration have successfully been reloaded.
# TYPE jmx_config_reload_success_total counter
jmx_config_reload_success_total 0.0
# HELP jvm_threads_current Current thread count of a JVM
# TYPE jvm_threads_current gauge
jvm_threads_current 118.0
# HELP jvm_threads_daemon Daemon thread count of a JVM
# TYPE jvm_threads_daemon gauge
jvm_threads_daemon 61.0
# HELP jvm_threads_peak Peak thread count of a JVM
# TYPE jvm_threads_peak gauge
jvm_threads_peak 119.0
# HELP jvm_threads_started_total Started thread count of a JVM
# TYPE jvm_threads_started_total counter
jvm_threads_started_total 130.0
# HELP jvm_threads_deadlocked Cycles of JVM-threads that are in deadlock waiting to acquire object monitors or ownable synchronizers
# TYPE jvm_threads_deadlocked gauge
jvm_threads_deadlocked 0.0
# HELP jvm_threads_deadlocked_monitor Cycles of JVM-threads that are in deadlock waiting to acquire object monitors
# TYPE jvm_threads_deadlocked_monitor gauge
jvm_threads_deadlocked_monitor 0.0
# HELP jmx_config_reload_failure_total Number of times configuration have failed to be reloaded.
# TYPE jmx_config_reload_failure_total counter
jmx_config_reload_failure_total 0.0
# HELP jvm_info JVM version info
# TYPE jvm_info gauge
jvm_info{version="1.7.0_80-b15",vendor="Oracle Corporation",runtime="Java(TM) SE Runtime Environment",} 1.0
# HELP jvm_gc_collection_seconds Time spent in a given JVM garbage collector in seconds.
# TYPE jvm_gc_collection_seconds summary
jvm_gc_collection_seconds_count{gc="PS Scavenge",} 458.0
jvm_gc_collection_seconds_sum{gc="PS Scavenge",} 5.806
jvm_gc_collection_seconds_count{gc="PS MarkSweep",} 3.0
jvm_gc_collection_seconds_sum{gc="PS MarkSweep",} 1.192
# HELP jvm_classes_loaded The number of classes that are currently loaded in the JVM
# TYPE jvm_classes_loaded gauge
jvm_classes_loaded 37664.0
# HELP jvm_classes_loaded_total The total number of classes that have been loaded since the JVM has started execution
# TYPE jvm_classes_loaded_total counter
jvm_classes_loaded_total 37664.0
# HELP jvm_classes_unloaded_total The total number of classes that have been unloaded since the JVM has started execution
# TYPE jvm_classes_unloaded_total counter
jvm_classes_unloaded_total 0.0

P.S:我尝试按照其他来源的建议将抓取间隔增加到 30 秒但没有成功,但出现了同样的错误(即使我在一秒钟内获得了带有 curl 的度量端点的输出)。

首先给出解决方案: 通过在目的地(即客户端)更新 nginx 以使用 proxy_http_version 1.1.

解决了问题

让我解释一下我的设置,以便我们了解为什么首先需要 nginx 以及我是如何得出解决方案的。

注:

  • 客户端不直接visible/accessible到普罗米修斯
  • 客户端只开放有限的端口 22、443、80
  • 网桥只能通过 SSH 连接
  • 网桥在端口 80 上托管 Nginx

根据普罗米修斯抓取配置:

  • 端口 55555 正在通过 ssh 隧道从 Prometheus 将流量转发到 Bridge
  • testsvr2 是 bridge 的 nginx 解析为 cients 之一的客户端
  • jmx_exporter 是客户端 nginx 中的一个条目,它转发到端口 9117

我是怎么得出问题源的Nginx的

我添加了一个可以直接从 Prometheus 访问的客户端(用于随机测试)并将 prometheus 抓取配置更改为

global:

  scrape_interval: 15s

scrape_configs:
  - job_name: testsvr2_node
    scrape_interval: 5s
    metrics_path: /node_exporter/metrics
    static_configs:
      - targets: ['testsvr2']
  - job_name: testsvr2_jmx
    scrape_interval: 20s
    metrics_path: /jmx_exporter/metrics
    static_configs:
      - targets: ['testsvr2']

在上述配置更改后,我开始收到不同的错误,即 "unexpected EOF" 并开始研究如何解决问题,然后得到最终结果。

错误消息从 "context deadline exceeded" 更改为 "unexpected EOF" 只有这样我才能找到解决方案。

希望这对架构相似但错误消息不太有用的人有所帮助。