无法从 GKE 上的自定义 fluentd 收集指标

Unable to collect metrics from customized fluentd on GKE

在另一个命名空间中自定义 fluentd 后,我无法在我的 GKE 上启用指标。 我对 fluentd configmap 添加了一些更改,因为 kube-system 命名空间中的 GKE 默认 fluentd & configmap 无法更改(更改总是会恢复),我部署了 fluentd 和另一个命名空间中的事件导出器。

但是在我进行更改后指标丢失了。所有日志都正常,还在日志查看器中。

需要做什么才能让 GKE 再次收集指标?或者我错了,有没有办法修改默认的fluentd configmap in kube-system?

我找不到关于此主题的任何有用信息。所以我创建了一个 GCP 支持票。 Google 提供了一种解决方案:

With Cloud Operations for GKE, you can collect just system logs [1] that way monitoring remains enabled in your cluster. Please note that this option can be enabled only via console but not via gcloud command line. There is a tracking bug, https://issuetracker.google.com/163356799 for the same.

Further, you can deploy your own configurable Fluentd daemonset to customize the applications logs [2]

You will be running 2 daemonsets for fluentd with this config, however to reduce the amount of log duplication it would be recommended that you decrease the logging from CloudOps to capture system logs only[2], while your customized fluentd daemonset will be able to capture your application workload logs.

The disadvantages from using this approach are: ensuring your custom deployment doesn't overlap something CloudOps is watching (ie. files, logs), there will be an increased amount of API calls and you will be responsible for updating/maintaining and managing your custom fluentd deployment.

[1] https://cloud.google.com/stackdriver/docs/solutions/gke/installing#controlling_the_collection_of_application_logs

[2]。 https://cloud.google.com/solutions/customizing-stackdriver-logs-fluentd