为什么 Flink 使用 Pushgateway 而不是 Prometheus 通常的 pull 模型来收集通用指标？

Question

我们可以看到，在将 Flink Metrics 暴露给外部系统（例如 Prometheus）时，Flink 使用 Pushgateway 而不是 Prometheus 通常的 pull 模型来收集一般指标。

@Override
public void report() {
    try {
        pushGateway.push(CollectorRegistry.defaultRegistry, jobName);
    } catch (Exception e) {
        log.warn("Failed to push metrics to PushGateway with jobName {}.", jobName, e);
    }
}

https://github.com/apache/flink/blob/master/flink-metrics/flink-metrics-prometheus/src/main/java/org/apache/flink/metrics/prometheus/PrometheusPushGatewayReporter.java

然而从 Prometheus 的官方文档中可以看出 "Prometheus scrapes metrics from instrumented jobs, either directly or via an intermediary push gateway for short-lived jobs" ，显然 Flink Streaming 作业不是短命作业，那么为什么 Flink 使用 Pushgateway 而不是 Prometheus 通常的 pull 模型来收集一般指标？

https://prometheus.io/docs/introduction/overview/

Answer 1

Flink 同时提供 PrometheusPushGatewayReporter and the generally more appropriate pull-based PrometheusReporter。 Prometheus 已经非常受 Flink 用户的欢迎，并且社区对支持这两种连接类型很感兴趣。

为什么 Flink 使用 Pushgateway 而不是 Prometheus 通常的 pull 模型来收集通用指标？

Why Flink uses the Pushgateway instead of Prometheus's usual pull model for general metrics collection?

apache-flink

flink-streaming

prometheus-pushgateway