带有 Istio zipkin 的 OCI APM 域不推送跟踪细节

OCI APM domain with Istio zipkin not pushing tracing details

我正在按照此文档设置分布式跟踪:https://docs.oracle.com/en-us/iaas/Content/ContEng/Tasks/contengistio-intro-topic.htm#exploring_istio_observability

我的集群在 GKE GCP 上用于测试目的,在其顶部安装了 istio 并遵循文档和设置服务。

服务已启动 运行 Prometheus、Grafana、Jeger 和 Zipkin。

步骤失败:使用 OCI 应用程序性能监视执行分布式跟踪。

尝试更新 sidecar 注入器的配置映射,以便我可以将跟踪详细信息推送到 zipkin 域。

配置 Zipkin 域并使用 public-span 现在在 configmap.

中使用
apiVersion: v1
kind: ConfigMap
metadata:
  name: istio-custom-bootstrap-config
  namespace: default
data:
  custom_bootstrap.json: |
    {
        "tracing": {
            "http": {
                "name": "envoy.tracers.zipkin",
                "typed_config": {
                    "@type": "type.googleapis.com/envoy.config.trace.v3.ZipkinConfig",
                    "collector_cluster": "aaaabbbb.apm-agt.us-ashburn-1.oci.oraclecloud.com", // [Replace this with data upload endpoint of your apm domain]
                    "collector_endpoint": "/20200101/observations/private-span?dataFormat=zipkin&dataFormatVersion=2&dataKey=2C6YOLQSUZ5Q7IGN", // [Replace with the private datakey of your apm domain. You can also use public datakey but change the observation type to public-span]
                    "collectorEndpointVersion": "HTTP_JSON",
                    "trace_id_128bit": true,
                    "shared_span_context": false
                }
            }
        },
        "static_resources": {
            "clusters": [{
                "name": "aaaabbbb.apm-agt.us-ashburn-1.oci.oraclecloud.com", // [Replace this with data upload endpoint of your apm domain:443]
                "type": "STRICT_DNS",
                "lb_policy": "ROUND_ROBIN",
                "load_assignment": {
                    "cluster_name": "aaaabbbb.apm-agt.us-ashburn-1.oci.oraclecloud.com", // [Replace this with data upload endpoint of your apm domain]
                    "endpoints": [{
                        "lb_endpoints": [{
                            "endpoint": {
                                "address": {
                                    "socket_address": {
                                        "address": "aaaabbbb.apm-agt.us-ashburn-1.oci.oraclecloud.com", // [Replace this with data upload endpoint of your apm domain]
                                        "port_value": 443
                                    }
                                }
                            }
                        }]
                    }]
                },
                "transport_socket": {
                    "name": "envoy.transport_sockets.tls",
                    "typed_config": {
                        "@type": "type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext",
                        "sni": "aaaabbbb.apm-agt.us-ashburn-1.oci.oraclecloud.com" // [Replace this with data upload endpoint of your apm domain]
                    }
                }
            }]
        }
    }

上面的 configmap 没有按预期工作,sidecar 由于缺少密钥而崩溃 connection_timeout 尽管在添加 configmap sidecar 后没有显示错误。

ZipkinIstiod 容器中没有发现错误,不确定如何进一步调试。

错误日志:

2022-03-30T05:59:33.146580Z     info    FLAG: --concurrency="2"
2022-03-30T05:59:33.146632Z     info    FLAG: --domain="default.svc.cluster.local"
2022-03-30T05:59:33.146642Z     info    FLAG: --help="false"
2022-03-30T05:59:33.146648Z     info    FLAG: --log_as_json="false"
2022-03-30T05:59:33.146672Z     info    FLAG: --log_caller=""
2022-03-30T05:59:33.146678Z     info    FLAG: --log_output_level="default:info"
2022-03-30T05:59:33.146682Z     info    FLAG: --log_rotate=""
2022-03-30T05:59:33.146687Z     info    FLAG: --log_rotate_max_age="30"
2022-03-30T05:59:33.146693Z     info    FLAG: --log_rotate_max_backups="1000"
2022-03-30T05:59:33.146699Z     info    FLAG: --log_rotate_max_size="104857600"
2022-03-30T05:59:33.146704Z     info    FLAG: --log_stacktrace_level="default:none"
2022-03-30T05:59:33.146715Z     info    FLAG: --log_target="[stdout]"
2022-03-30T05:59:33.146725Z     info    FLAG: --meshConfig="./etc/istio/config/mesh"
2022-03-30T05:59:33.146730Z     info    FLAG: --outlierLogPath=""
2022-03-30T05:59:33.146736Z     info    FLAG: --proxyComponentLogLevel="misc:error"
2022-03-30T05:59:33.146741Z     info    FLAG: --proxyLogLevel="warning"
2022-03-30T05:59:33.146747Z     info    FLAG: --serviceCluster="reviews.default"
2022-03-30T05:59:33.146753Z     info    FLAG: --stsPort="0"
2022-03-30T05:59:33.146760Z     info    FLAG: --templateFile=""
2022-03-30T05:59:33.146767Z     info    FLAG: --tokenManagerPlugin="GoogleTokenExchange"
2022-03-30T05:59:33.146784Z     info    Version 1.8.0-c87a4c874df27e37a3e6c25fa3d1ef6279685d23-Clean
2022-03-30T05:59:33.146991Z     info    Obtained private IP [10.4.1.6]
2022-03-30T05:59:33.147107Z     info    Apply proxy config from env {"tracing":{"zipkin":{"address":"caadc76wvdp7edddddddccclii.apm-agt.ap-mumbai-1.oci.oraclecloud.com:443"}},"proxyMetadata":{"DNS_AGENT":""}}

2022-03-30T05:59:33.148650Z     info    Effective config: binaryPath: /usr/local/bin/envoy
concurrency: 2
configPath: ./etc/istio/proxy
controlPlaneAuthPolicy: MUTUAL_TLS
discoveryAddress: istiod.istio-system.svc:15012
drainDuration: 45s
envoyAccessLogService: {}
envoyMetricsService: {}
parentShutdownDuration: 60s
proxyAdminPort: 15000
proxyMetadata:
  DNS_AGENT: ""
serviceCluster: reviews.default
statNameLength: 189
statusPort: 15020
terminationDrainDuration: 5s
tracing:
  zipkin:
    address: caadc76wvdp7edddddddccclii.apm-agt.ap-mumbai-1.oci.oraclecloud.com:443

2022-03-30T05:59:33.148721Z     info    Proxy role: &model.Proxy{RWMutex:sync.RWMutex{w:sync.Mutex{state:0, sema:0x0}, writerSem:0x0, readerSem:0x0, readerCount:0, readerWait:0}, Type:"sidecar", IPAddresses:[]string{"10.4.1.6"}, ID:"reviews-v1-5d6559df86-qbg6b.default", Locality:(*envoy_config_core_v3.Locality)(nil), DNSDomain:"default.svc.cluster.local", ConfigNamespace:"", Metadata:(*model.NodeMetadata)(nil), SidecarScope:(*model.SidecarScope)(nil), PrevSidecarScope:(*model.SidecarScope)(nil), MergedGateway:(*model.MergedGateway)(nil), ServiceInstances:[]*model.ServiceInstance(nil), IstioVersion:(*model.IstioVersion)(nil), VerifiedIdentity:(*spiffe.Identity)(nil), ipv6Support:false, ipv4Support:false, GlobalUnicastIP:"", XdsResourceGenerator:model.XdsResourceGenerator(nil), WatchedResources:map[string]*model.WatchedResource(nil)}
2022-03-30T05:59:33.148732Z     info    JWT policy is third-party-jwt
2022-03-30T05:59:33.148777Z     info    PilotSAN []string{"istiod.istio-system.svc"}
2022-03-30T05:59:33.148827Z     info    sa.serverOptions.CAEndpoint == istiod.istio-system.svc:15012 Citadel
2022-03-30T05:59:33.148916Z     info    Using CA istiod.istio-system.svc:15012 cert with certs: var/run/secrets/istio/root-cert.pem
2022-03-30T05:59:33.149082Z     info    citadelclient   Citadel client using custom root: istiod.istio-system.svc:15012 -----BEGIN CERTIFICATE-----
MIIC/DCCAeSgAwIBAgIQOzOVPb98v+UHCpf80MI1pTANBgkqhkiG9w0BAQsFADAY
MRYwFAYDVQQKEw1jbHVzdGVyLmxvY2FsMB4XDTIyMDMyOTE3MzIyOFoXDTMyMDMy
NjE3MzIyOFowGDEWMBQGA1UEChMNY2x1c3Rlci5sb2NhbDCCASIwDQYJKoZIhvcN
AQEBBQADggEPADCCAQoCggEBAO4j6Sa5VoFCUctY/ehMsFfXjejVHE05PzgaTt0x
zGK6WDLd4bQHVxiEERs2bQcPYP55T+AqBo4cyU5BFi7gEvrVdfHDMGdl4f3rhojB
RNdPLw9axyBNulOYBGIOIthpYY45fPLqvADQmU6GIUqcpg83zuwiyufbaCuElVuJ
h3eMebBQL6zsm+4BFZOTECvjMMpH/HSjOKdW/XsUU71FSVPo9q6devzLgCquZemO
kWHGjTtibwPcyRTZiL9FgBMnFF5gXe5K8FauIQlgkTDTWPj99n2FPGrfgEEC+z3q
O12NYi41zdY9RTk7f6kFHTzLRcGQ8ItG9MRebfZSfDqudCsCAwEAAaNCMEAwDgYD
VR0PAQH/BAQDAgIEMA8GA1UdEwEB/wQFMAMBAf8wHQYDVR0OBBYEFKAN5Ltn7oIN
l+9yoTfvOIvhBdTCMA0GCSqGSIb3DQEBCwUAA4IBAQCOKu1XEvJKXwRR/VNaL19L
iTIsC5csW4Dg1Z8aFQk+1UwroBbsdjCkPiwK0FJKHMoobIOtSbjn9k+OaUfv4pZo
D8dsDznqGJpkkiZ7zviwmpS3+B2YHoKFRs0ZXHu4hC081AUFjfFvFcwjtfPYKSGU
KqtxKPuvXCVGqaPdmkg5J4gG5q+Yutxno4m3VxGVocuHzXI9/Kox2Lz0C3royfF7
XoTxNy08TzkjDPuPCLqYy85zFOM7PzuuuK7ZIkdXpKbStIWLbjkciqLPzwi18JaH
eyS1/hORUC7AKMj8a3fKWrFsRiMu4Mdv+knnQ1ntLqb5Vy85VTvNFAvAB7mwD/NN
-----END CERTIFICATE-----

2022-03-30T05:59:33.219251Z     info    sds     SDS gRPC server for workload UDS starts, listening on "./etc/istio/proxy/SDS"

2022-03-30T05:59:33.219548Z     info    xdsproxy        Initializing with upstream address istiod.istio-system.svc:15012 and cluster Kubernetes
2022-03-30T05:59:33.219346Z     info    sds     Start SDS grpc server
2022-03-30T05:59:33.220303Z     info    xdsproxy        adding watcher for certificate var/run/secrets/istio/root-cert.pem
2022-03-30T05:59:33.220894Z     info    Starting proxy agent
2022-03-30T05:59:33.222017Z     info    Opening status port 15020

2022-03-30T05:59:33.222278Z     info    Received new config, creating new Envoy epoch 0
2022-03-30T05:59:33.222328Z     info    Epoch 0 starting
2022-03-30T05:59:33.239683Z     info    Envoy command: [-c etc/istio/proxy/envoy-rev0.json --restart-epoch 0 --drain-time-s 45 --parent-shutdown-time-s 60 --service-cluster reviews.default --service-node sidecar~10.4.1.6~reviews-v1-5d6559df86-qbg6b.default~default.svc.cluster.local --local-address-ip-version v4 --bootstrap-version 3 --log-format-prefix-with-location 0 --log-format %Y-%m-%dT%T.%fZ     %l      envoy %n        %v -l warning --component-log-level misc:error --config-yaml {
    "tracing": {
        "http": {
            "name": "envoy.tracers.zipkin",
            "typed_config": {
                "@type": "type.googleapis.com/envoy.config.trace.v3.ZipkinConfig",
                "collector_cluster": "aaaabbbb.apm-agt.us-ashburn-1.oci.oraclecloud.com",
                "collector_endpoint": "/20200101/observations/public-span?dataFormat=zipkin&dataFormatVersion=2&dataKey=MAYH36IJELZRXTEETKL7QEA7NPA5UNEI",
                "collectorEndpointVersion": "HTTP_JSON",
                "trace_id_128bit": true,
                "shared_span_context": false
            }
        }
    },
    "static_resources": {
        "clusters": [{
            "name": "aaaabbbb.apm-agt.us-ashburn-1.oci.oraclecloud.com:443",
            "connect_timeout": "5s",
            "type": "STRICT_DNS",
            "lb_policy": "ROUND_ROBIN",
            "load_assignment": {
                "cluster_name": "aaaabbbb.apm-agt.us-ashburn-1.oci.oraclecloud.com",
                "endpoints": [{
                    "lb_endpoints": [{
                        "endpoint": {
                            "address": {
                                "socket_address": {
                                    "address": "aaaabbbb.apm-agt.us-ashburn-1.oci.oraclecloud.com",
                                    "port_value": 443
                                }
                            }
                        }
                    }]
                }]
            },
            "transport_socket": {
                "name": "envoy.transport_sockets.tls",
                "typed_config": {
                    "@type": "type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext",
                    "sni": "aaaabbbb.apm-agt.us-ashburn-1.oci.oraclecloud.com"
                }
            }
        }]
    }
}
 --concurrency 2]
2022-03-30T05:59:33.315619Z     warning envoy runtime   Unable to use runtime singleton for feature envoy.http.headermap.lazy_map_min_size
2022-03-30T05:59:33.315693Z     warning envoy runtime   Unable to use runtime singleton for feature envoy.http.headermap.lazy_map_min_size
2022-03-30T05:59:33.316469Z     warning envoy runtime   Unable to use runtime singleton for feature envoy.http.headermap.lazy_map_min_size
2022-03-30T05:59:33.316542Z     warning envoy runtime   Unable to use runtime singleton for feature envoy.http.headermap.lazy_map_min_size
2022-03-30T05:59:33.390651Z     info    xdsproxy        Envoy ADS stream established
2022-03-30T05:59:33.391110Z     info    xdsproxy        connecting to upstream XDS server: istiod.istio-system.svc:15012
2022-03-30T05:59:33.396461Z     warning envoy main      there is no configured limit to the number of allowed active connections. Set a limit via the runtime key overload.global_downstream_max_connections
2022-03-30T05:59:33.478768Z     info    sds     resource:ROOTCA new connection
2022-03-30T05:59:33.479543Z     info    sds     Skipping waiting for gateway secret
2022-03-30T05:59:33.479419Z     info    sds     resource:default new connection
2022-03-30T05:59:33.479917Z     info    sds     Skipping waiting for gateway secret
2022-03-30T05:59:33.682346Z     info    cache   Root cert has changed, start rotating root cert for SDS clients
2022-03-30T05:59:33.682714Z     info    cache   GenerateSecret default
2022-03-30T05:59:33.683386Z     info    sds     resource:default pushed key/cert pair to proxy
2022-03-30T05:59:34.079948Z     info    cache   Loaded root cert from certificate ROOTCA
2022-03-30T05:59:34.080300Z     info    sds     resource:ROOTCA pushed root cert to proxy
2022-03-30T05:59:34.154971Z     warning envoy config    gRPC config for type.googleapis.com/envoy.config.listener.v3.Listener rejected: Error adding/updating listener(s) 10.8.14.87_14250: envoy.tracers.zipkin: unknown cluster 'aaaabbbb.apm-agt.us-ashburn-1.oci.oraclecloud.com'
0.0.0.0_20001: envoy.tracers.zipkin: unknown cluster 'aaaabbbb.apm-agt.us-ashburn-1.oci.oraclecloud.com'
10.8.1.76_3000: envoy.tracers.zipkin: unknown cluster 'aaaabbbb.apm-agt.us-ashburn-1.oci.oraclecloud.com'
0.0.0.0_9411: envoy.tracers.zipkin: unknown cluster 'aaaabbbb.apm-agt.us-ashburn-1.oci.oraclecloud.com'
10.8.1.191_15021: envoy.tracers.zipkin: unknown cluster 'aaaabbbb.apm-agt.us-ashburn-1.oci.oraclecloud.com'
0.0.0.0_9080: envoy.tracers.zipkin: unknown cluster 'aaaabbbb.apm-agt.us-ashburn-1.oci.oraclecloud.com'
10.8.5.43_443: envoy.tracers.zipkin: unknown cluster 'aaaabbbb.apm-agt.us-ashburn-1.oci.oraclecloud.com'
0.0.0.0_15010: envoy.tracers.zipkin: unknown cluster 'aaaabbbb.apm-agt.us-ashburn-1.oci.oraclecloud.com'
0.0.0.0_15014: envoy.tracers.zipkin: unknown cluster 'aaaabbbb.apm-agt.us-ashburn-1.oci.oraclecloud.com'
10.8.14.87_14268: envoy.tracers.zipkin: unknown cluster 'aaaabbbb.apm-agt.us-ashburn-1.oci.oraclecloud.com'
0.0.0.0_80: envoy.tracers.zipkin: unknown cluster 'aaaabbbb.apm-agt.us-ashburn-1.oci.oraclecloud.com'
0.0.0.0_9090: envoy.tracers.zipkin: unknown cluster 'aaaabbbb.apm-agt.us-ashburn-1.oci.oraclecloud.com'
virtualInbound: envoy.tracers.zipkin: unknown cluster 'aaaabbbb.apm-agt.us-ashburn-1.oci.oraclecloud.com'

2022-03-30T05:59:35.844720Z     warn    Envoy proxy is NOT ready: config not received from Pilot (is Pilot running?): cds updates: 1 successful, 0 rejected; lds updates: 0 successful, 1 reje

经过 2-3 天的调试后,可以解决 istioZipkinOCI 的分布式跟踪问题APM.

注意:对于 root 用户,它不起作用,所以我在 OCI 中创建了一个隔间,创建了 IAM 策略、组,并将隔间的完全访问权限授予该组。

将 root 用户添加到组中,奇怪的是它开始工作,而使用直接 root 用户和默认策略它不工作。

政策参考文档:https://docs-uat.us.oracle.com/en/cloud/paas/application-performance-monitoring/apmgn/perform-oracle-cloud-infrastructure-prerequisite-tasks.html

正在工作 configmap sidecar

connect_timeout 密钥是必需的,否则边车会失败,因此 PODs 不会进入 Ready 状态。不需要官方文档中提到的端口443

apiVersion: v1
kind: ConfigMap
metadata:
  name: istio-custom-bootstrap-config
data:
  custom_bootstrap.json: |
    {
        "tracing": {
            "http": {
                "name": "envoy.tracers.zipkin",
                "typed_config": {
                    "@type": "type.googleapis.com/envoy.config.trace.v3.ZipkinConfig",
                    "collector_cluster": "aacytncaaaaaaaal2a.apm-agt.ap-mumbai-1.oci.oraclecloud.com", 
                    "collector_endpoint": "/20200101/observations/public-span?dataFormat=zipkin&dataFormatVersion=2&dataKey=M7SOSHXXXXXXXXXXXXXXXXXXXUZEHOGRSA",
                    "collector_endpoint_version": "HTTP_JSON",
                    "trace_id_128bit": true,
                    "shared_span_context": false
                }
            }
        },
        "static_resources": {
            "clusters": [{
                "name": "aacytncaaaaaaaal2a.apm-agt.ap-mumbai-1.oci.oraclecloud.com",
                "type": "STRICT_DNS",
                "connect_timeout": "5s",
                "lb_policy": "ROUND_ROBIN",
                "load_assignment": {
                    "cluster_name": "aacytncaaaaaaaal2a.apm-agt.ap-mumbai-1.oci.oraclecloud.com", 
                    "endpoints": [{
                        "lb_endpoints": [{
                            "endpoint": {
                                "address": {
                                    "socket_address": {
                                        "address": "aacytncaaaaaaaal2a.apm-agt.ap-mumbai-1.oci.oraclecloud.com", 
                                        "port_value": 443
                                    }
                                }
                            }
                        }]
                    }]
                },
                "transport_socket": {
                    "name": "envoy.transport_sockets.tls",
                    "typed_config": {
                        "@type": "type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext",
                        "sni": "aacytncaaaaaaaal2a.apm-agt.ap-mumbai-1.oci.oraclecloud.com" 
                    }
                }
            }]
        }
    }

Istio 配置

sampling: 100 会将大部分跟踪推送到 ZipkinOCI APM 域。我还启用了 enableTracing: true

阅读更多内容:https://istio.io/latest/docs/tasks/observability/distributed-tracing/mesh-and-proxy-config/

data:
  mesh: |-
    accessLogFile: /dev/stdout
    enableTracing: true
    defaultConfig:
      discoveryAddress: istiod.istio-system.svc:15012
      proxyMetadata: {}
      tracing:
        sampling: 100
        zipkin:
          address: aacytncaaaaaaaal2a.apm-agt.ap-mumbai-1.oci.oraclecloud.com:443
    enablePrometheusMerge: true
    rootNamespace: istio-system
    outboundTrafficPolicy:
      mode: ALLOW_ANY
    trustDomain: cluster.local
  meshNetworks: 'networks: {}'

OCI 控制台