AWS XRay SDK 无法读取 docker 容器内的环境变量

AWS XRay SDK fails to read environment variables within a docker container

AWS XRay 是一种跟踪服务,可让您跟踪分布式系统中的请求,甚至可以分析您的服务。无需过多了解 XRay 的工作原理,它基本上会监控您的服务并通过 UDP 将有关每个服务请求的数据发送到收集此数据并将其发送到 AWS 的守护程序。

当 运行 在本地或 EC2 中 运行ning 时,此守护程序在服务 运行 所在的机器上是本地的,并且在端口 2000 上可用。这是该位置的默认配置守护进程主机。

在Kubernetes中运行ning时,需要在每个节点上设置一个daemon到运行。根据 documentation for setting up XRay with Kubernetes, you can override the default value by setting an environment variable AWS_XRAY_DAEMON_ADDRESS with the required host, or you can set a JVM system variable com.amazonaws.xray.emitters.daemonAddress. There is also a reference to this in the SDK documentation.

由于我的用例,以及我们如何在我的组织中共享配置,我想利用设置环境变量的方法。

根据文档,我们通过 helm 图表将其设置为部署:

env:
  - name: AWS_XRAY_DAEMON_ADDRESS
    value: aws-xray-daemon.default

通过在 pod 中执行,服务 运行 正在启动,并且 运行 正在 printenv 我们可以看到该值已在部署时成功设置。


问题:

当 XRay 尝试分析并将数据发送到守护程序时,抛出 SdkClientException

com.amazonaws.SdkClientException: Unable to execute HTTP request: Connect to 127.0.0.1:2000 [/127.0.0.1] failed: Connection refused (Connection refused)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleRetryableException(AmazonHttpClient.java:1201) ~[aws-java-sdk-core-1.11.739.jar!/:na]
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1147) ~[aws-java-sdk-core-1.11.739.jar!/:na]
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:796) ~[aws-java-sdk-core-1.11.739.jar!/:na]
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:764) ~[aws-java-sdk-core-1.11.739.jar!/:na]
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:738) ~[aws-java-sdk-core-1.11.739.jar!/:na]
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access0(AmazonHttpClient.java:698) ~[aws-java-sdk-core-1.11.739.jar!/:na]
        at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:680) ~[aws-java-sdk-core-1.11.739.jar!/:na]
        at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:544) ~[aws-java-sdk-core-1.11.739.jar!/:na]
        at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:524) ~[aws-java-sdk-core-1.11.739.jar!/:na]
        at com.amazonaws.services.xray.AWSXRayClient.doInvoke(AWSXRayClient.java:1607) ~[aws-java-sdk-xray-1.11.739.jar!/:na]
        at com.amazonaws.services.xray.AWSXRayClient.invoke(AWSXRayClient.java:1574) ~[aws-java-sdk-xray-1.11.739.jar!/:na]
        at com.amazonaws.services.xray.AWSXRayClient.invoke(AWSXRayClient.java:1563) ~[aws-java-sdk-xray-1.11.739.jar!/:na]
        at com.amazonaws.services.xray.AWSXRayClient.executeGetSamplingRules(AWSXRayClient.java:800) ~[aws-java-sdk-xray-1.11.739.jar!/:na]
        at com.amazonaws.services.xray.AWSXRayClient.getSamplingRules(AWSXRayClient.java:771) ~[aws-java-sdk-xray-1.11.739.jar!/:na]
        at com.amazonaws.xray.strategy.sampling.pollers.RulePoller.pollRule(RulePoller.java:65) ~[aws-xray-recorder-sdk-core-2.4.0.jar!/:na]
        at com.amazonaws.xray.strategy.sampling.pollers.RulePoller.lambda$start[=12=](RulePoller.java:46) ~[aws-xray-recorder-sdk-core-2.4.0.jar!/:na]
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) ~[na:na]
        at java.base/java.util.concurrent.FutureTask.runAndReset(Unknown Source) ~[na:na]
        at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source) ~[na:na]
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) ~[na:na]
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) ~[na:na]
        at java.base/java.lang.Thread.run(Unknown Source) ~[na:na]
        ...

这意味着 AWS SDK 没有像文档中建议的那样获取这个环境变量,而只是使用默认值 127.0.0.1:2000

然后我深入研究了 SDK 代码,以了解它是如何检索这个变量的,并发现 运行 它使用 System.getenv("AWS_XRAY_DAEMON_ADDRESS") 的代码如下所示:

    /**
     * Environment variable key used to override the address to which UDP packets will be emitted. Valid values are of the form `ip_address:port`. Takes precedence over any system property,
     * constructor value, or setter value used.
     */
    public static final String DAEMON_ADDRESS_ENVIRONMENT_VARIABLE_KEY = "AWS_XRAY_DAEMON_ADDRESS";

    /**
     * System property key used to override the address to which UDP packets will be emitted. Valid values are of the form `ip_address:port`. Takes precedence over any constructor or setter value
     * used.
     */
    public static final String DAEMON_ADDRESS_SYSTEM_PROPERTY_KEY = "com.amazonaws.xray.emitters.daemonAddress";

    public DaemonConfiguration() {
        String environmentAddress = System.getenv(DAEMON_ADDRESS_ENVIRONMENT_VARIABLE_KEY);
        String systemAddress = System.getProperty(DAEMON_ADDRESS_SYSTEM_PROPERTY_KEY);

        if (setUDPAndTCPAddress(environmentAddress)) {
            logger.info(String.format("Environment variable %s is set. Emitting to daemon on address %s.", DAEMON_ADDRESS_ENVIRONMENT_VARIABLE_KEY, getUDPAddress()));
        } else if (setUDPAndTCPAddress(systemAddress)) {
            logger.info(String.format("System property %s is set. Emitting to daemon on address %s.", DAEMON_ADDRESS_SYSTEM_PROPERTY_KEY, getUDPAddress()));
        }
    }

所以我想,也许我没有正确设置环境变量?于是我在服务启动时添加了一条获取环境变量的日志,发现JVM确实可以找到这个值:

代码:

System.out.println("System.getenv(\"AWS_XRAY_DAEMON_ADDRESS\")" + " = " + System.getenv("AWS_XRAY_DAEMON_ADDRESS")) 

输出:

System.getenv("AWS_XRAY_DAEMON_ADDRESS") = aws-xray-daemon.default

据我所知,这段代码与 AWS SDK 应该 运行 完全匹配,但它似乎从未被执行过,即使是,它也没有相同的结果就像我用我的日志测试过的那样。

运行 在本地,我无法重现此问题,因为它会获取我从本地环境变量中提供的主机。我还确认,当 运行 在本地使用断点时,可以到达上面粘贴的 AWS SDK 代码。

有什么想法吗?


Gradle 片段:

ext {
    ...
    springCloudVersion = "Greenwich.RELEASE"
    awsCoreVersion = '1.11.739'
    awsXrayVersion = '2.4.0' 
    ...
}

dependencyManagement {
    imports {
        mavenBom "org.springframework.cloud:spring-cloud-dependencies:${springCloudVersion}"
        mavenBom "com.amazonaws:aws-java-sdk-bom:${awsCoreVersion}"
        mavenBom "com.amazonaws:aws-xray-recorder-sdk-bom:${awsXrayVersion}"
    }
}

dependencies {
    ...

    implementation "com.amazonaws:aws-java-sdk-core"
    implementation "com.amazonaws:aws-xray-recorder-sdk-core" 
    implementation "com.amazonaws:aws-xray-recorder-sdk-aws-sdk" 
    implementation "com.amazonaws:aws-xray-recorder-sdk-spring" 
    implementation "com.amazonaws:aws-xray-recorder-sdk-apache-http" 
    implementation "com.amazonaws:aws-xray-recorder-sdk-sql-postgres" 

    implementation 'org.springframework.boot:spring-boot-starter-web'
    implementation 'org.springframework.boot:spring-boot-starter'
    implementation 'org.springframework.boot:spring-boot-starter-data-jpa'
    implementation 'org.springframework.boot:spring-boot-starter-security'

    ...

}

其他信息:

其他尝试: - 我尝试通过 Dockerfile 设置环境变量。这有相同的结果。

原来我链接的 blog post 不是一个好博客 post。在示例中,他们没有指定主机的端口:

env:
- name: AWS_XRAY_DAEMON_ADDRESS 
  value: xray-service.default

更改环境变量以包含端口修复了问题:

env:
- name: AWS_XRAY_DAEMON_ADDRESS 
  value: xray-service.default:2000