AWS XRay SDK 无法读取 docker 容器内的环境变量
AWS XRay SDK fails to read environment variables within a docker container
AWS XRay 是一种跟踪服务,可让您跟踪分布式系统中的请求,甚至可以分析您的服务。无需过多了解 XRay 的工作原理,它基本上会监控您的服务并通过 UDP 将有关每个服务请求的数据发送到收集此数据并将其发送到 AWS 的守护程序。
当 运行 在本地或 EC2 中 运行ning 时,此守护程序在服务 运行 所在的机器上是本地的,并且在端口 2000 上可用。这是该位置的默认配置守护进程主机。
在Kubernetes中运行ning时,需要在每个节点上设置一个daemon到运行。根据 documentation for setting up XRay with Kubernetes, you can override the default value by setting an environment variable AWS_XRAY_DAEMON_ADDRESS
with the required host, or you can set a JVM system variable com.amazonaws.xray.emitters.daemonAddress
. There is also a reference to this in the SDK documentation.
由于我的用例,以及我们如何在我的组织中共享配置,我想利用设置环境变量的方法。
根据文档,我们通过 helm 图表将其设置为部署:
env:
- name: AWS_XRAY_DAEMON_ADDRESS
value: aws-xray-daemon.default
通过在 pod 中执行,服务 运行 正在启动,并且 运行 正在 printenv
我们可以看到该值已在部署时成功设置。
问题:
当 XRay 尝试分析并将数据发送到守护程序时,抛出 SdkClientException
:
com.amazonaws.SdkClientException: Unable to execute HTTP request: Connect to 127.0.0.1:2000 [/127.0.0.1] failed: Connection refused (Connection refused)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleRetryableException(AmazonHttpClient.java:1201) ~[aws-java-sdk-core-1.11.739.jar!/:na]
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1147) ~[aws-java-sdk-core-1.11.739.jar!/:na]
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:796) ~[aws-java-sdk-core-1.11.739.jar!/:na]
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:764) ~[aws-java-sdk-core-1.11.739.jar!/:na]
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:738) ~[aws-java-sdk-core-1.11.739.jar!/:na]
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access0(AmazonHttpClient.java:698) ~[aws-java-sdk-core-1.11.739.jar!/:na]
at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:680) ~[aws-java-sdk-core-1.11.739.jar!/:na]
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:544) ~[aws-java-sdk-core-1.11.739.jar!/:na]
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:524) ~[aws-java-sdk-core-1.11.739.jar!/:na]
at com.amazonaws.services.xray.AWSXRayClient.doInvoke(AWSXRayClient.java:1607) ~[aws-java-sdk-xray-1.11.739.jar!/:na]
at com.amazonaws.services.xray.AWSXRayClient.invoke(AWSXRayClient.java:1574) ~[aws-java-sdk-xray-1.11.739.jar!/:na]
at com.amazonaws.services.xray.AWSXRayClient.invoke(AWSXRayClient.java:1563) ~[aws-java-sdk-xray-1.11.739.jar!/:na]
at com.amazonaws.services.xray.AWSXRayClient.executeGetSamplingRules(AWSXRayClient.java:800) ~[aws-java-sdk-xray-1.11.739.jar!/:na]
at com.amazonaws.services.xray.AWSXRayClient.getSamplingRules(AWSXRayClient.java:771) ~[aws-java-sdk-xray-1.11.739.jar!/:na]
at com.amazonaws.xray.strategy.sampling.pollers.RulePoller.pollRule(RulePoller.java:65) ~[aws-xray-recorder-sdk-core-2.4.0.jar!/:na]
at com.amazonaws.xray.strategy.sampling.pollers.RulePoller.lambda$start[=12=](RulePoller.java:46) ~[aws-xray-recorder-sdk-core-2.4.0.jar!/:na]
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) ~[na:na]
at java.base/java.util.concurrent.FutureTask.runAndReset(Unknown Source) ~[na:na]
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source) ~[na:na]
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) ~[na:na]
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) ~[na:na]
at java.base/java.lang.Thread.run(Unknown Source) ~[na:na]
...
这意味着 AWS SDK 没有像文档中建议的那样获取这个环境变量,而只是使用默认值 127.0.0.1:2000
。
然后我深入研究了 SDK 代码,以了解它是如何检索这个变量的,并发现 运行 它使用 System.getenv("AWS_XRAY_DAEMON_ADDRESS")
的代码如下所示:
/**
* Environment variable key used to override the address to which UDP packets will be emitted. Valid values are of the form `ip_address:port`. Takes precedence over any system property,
* constructor value, or setter value used.
*/
public static final String DAEMON_ADDRESS_ENVIRONMENT_VARIABLE_KEY = "AWS_XRAY_DAEMON_ADDRESS";
/**
* System property key used to override the address to which UDP packets will be emitted. Valid values are of the form `ip_address:port`. Takes precedence over any constructor or setter value
* used.
*/
public static final String DAEMON_ADDRESS_SYSTEM_PROPERTY_KEY = "com.amazonaws.xray.emitters.daemonAddress";
public DaemonConfiguration() {
String environmentAddress = System.getenv(DAEMON_ADDRESS_ENVIRONMENT_VARIABLE_KEY);
String systemAddress = System.getProperty(DAEMON_ADDRESS_SYSTEM_PROPERTY_KEY);
if (setUDPAndTCPAddress(environmentAddress)) {
logger.info(String.format("Environment variable %s is set. Emitting to daemon on address %s.", DAEMON_ADDRESS_ENVIRONMENT_VARIABLE_KEY, getUDPAddress()));
} else if (setUDPAndTCPAddress(systemAddress)) {
logger.info(String.format("System property %s is set. Emitting to daemon on address %s.", DAEMON_ADDRESS_SYSTEM_PROPERTY_KEY, getUDPAddress()));
}
}
所以我想,也许我没有正确设置环境变量?于是我在服务启动时添加了一条获取环境变量的日志,发现JVM确实可以找到这个值:
代码:
System.out.println("System.getenv(\"AWS_XRAY_DAEMON_ADDRESS\")" + " = " + System.getenv("AWS_XRAY_DAEMON_ADDRESS"))
输出:
System.getenv("AWS_XRAY_DAEMON_ADDRESS") = aws-xray-daemon.default
据我所知,这段代码与 AWS SDK 应该 运行 完全匹配,但它似乎从未被执行过,即使是,它也没有相同的结果就像我用我的日志测试过的那样。
运行 在本地,我无法重现此问题,因为它会获取我从本地环境变量中提供的主机。我还确认,当 运行 在本地使用断点时,可以到达上面粘贴的 AWS SDK 代码。
有什么想法吗?
Gradle 片段:
ext {
...
springCloudVersion = "Greenwich.RELEASE"
awsCoreVersion = '1.11.739'
awsXrayVersion = '2.4.0'
...
}
dependencyManagement {
imports {
mavenBom "org.springframework.cloud:spring-cloud-dependencies:${springCloudVersion}"
mavenBom "com.amazonaws:aws-java-sdk-bom:${awsCoreVersion}"
mavenBom "com.amazonaws:aws-xray-recorder-sdk-bom:${awsXrayVersion}"
}
}
dependencies {
...
implementation "com.amazonaws:aws-java-sdk-core"
implementation "com.amazonaws:aws-xray-recorder-sdk-core"
implementation "com.amazonaws:aws-xray-recorder-sdk-aws-sdk"
implementation "com.amazonaws:aws-xray-recorder-sdk-spring"
implementation "com.amazonaws:aws-xray-recorder-sdk-apache-http"
implementation "com.amazonaws:aws-xray-recorder-sdk-sql-postgres"
implementation 'org.springframework.boot:spring-boot-starter-web'
implementation 'org.springframework.boot:spring-boot-starter'
implementation 'org.springframework.boot:spring-boot-starter-data-jpa'
implementation 'org.springframework.boot:spring-boot-starter-security'
...
}
其他信息:
- 运行 在 Spring Boot v2.2.1
- OpenJDK v11.0.4
- Gradle v6.0.1
其他尝试:
- 我尝试通过 Dockerfile
设置环境变量。这有相同的结果。
原来我链接的 blog post 不是一个好博客 post。在示例中,他们没有指定主机的端口:
env:
- name: AWS_XRAY_DAEMON_ADDRESS
value: xray-service.default
更改环境变量以包含端口修复了问题:
env:
- name: AWS_XRAY_DAEMON_ADDRESS
value: xray-service.default:2000
AWS XRay 是一种跟踪服务,可让您跟踪分布式系统中的请求,甚至可以分析您的服务。无需过多了解 XRay 的工作原理,它基本上会监控您的服务并通过 UDP 将有关每个服务请求的数据发送到收集此数据并将其发送到 AWS 的守护程序。
当 运行 在本地或 EC2 中 运行ning 时,此守护程序在服务 运行 所在的机器上是本地的,并且在端口 2000 上可用。这是该位置的默认配置守护进程主机。
在Kubernetes中运行ning时,需要在每个节点上设置一个daemon到运行。根据 documentation for setting up XRay with Kubernetes, you can override the default value by setting an environment variable AWS_XRAY_DAEMON_ADDRESS
with the required host, or you can set a JVM system variable com.amazonaws.xray.emitters.daemonAddress
. There is also a reference to this in the SDK documentation.
由于我的用例,以及我们如何在我的组织中共享配置,我想利用设置环境变量的方法。
根据文档,我们通过 helm 图表将其设置为部署:
env:
- name: AWS_XRAY_DAEMON_ADDRESS
value: aws-xray-daemon.default
通过在 pod 中执行,服务 运行 正在启动,并且 运行 正在 printenv
我们可以看到该值已在部署时成功设置。
问题:
当 XRay 尝试分析并将数据发送到守护程序时,抛出 SdkClientException
:
com.amazonaws.SdkClientException: Unable to execute HTTP request: Connect to 127.0.0.1:2000 [/127.0.0.1] failed: Connection refused (Connection refused)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleRetryableException(AmazonHttpClient.java:1201) ~[aws-java-sdk-core-1.11.739.jar!/:na]
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1147) ~[aws-java-sdk-core-1.11.739.jar!/:na]
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:796) ~[aws-java-sdk-core-1.11.739.jar!/:na]
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:764) ~[aws-java-sdk-core-1.11.739.jar!/:na]
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:738) ~[aws-java-sdk-core-1.11.739.jar!/:na]
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access0(AmazonHttpClient.java:698) ~[aws-java-sdk-core-1.11.739.jar!/:na]
at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:680) ~[aws-java-sdk-core-1.11.739.jar!/:na]
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:544) ~[aws-java-sdk-core-1.11.739.jar!/:na]
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:524) ~[aws-java-sdk-core-1.11.739.jar!/:na]
at com.amazonaws.services.xray.AWSXRayClient.doInvoke(AWSXRayClient.java:1607) ~[aws-java-sdk-xray-1.11.739.jar!/:na]
at com.amazonaws.services.xray.AWSXRayClient.invoke(AWSXRayClient.java:1574) ~[aws-java-sdk-xray-1.11.739.jar!/:na]
at com.amazonaws.services.xray.AWSXRayClient.invoke(AWSXRayClient.java:1563) ~[aws-java-sdk-xray-1.11.739.jar!/:na]
at com.amazonaws.services.xray.AWSXRayClient.executeGetSamplingRules(AWSXRayClient.java:800) ~[aws-java-sdk-xray-1.11.739.jar!/:na]
at com.amazonaws.services.xray.AWSXRayClient.getSamplingRules(AWSXRayClient.java:771) ~[aws-java-sdk-xray-1.11.739.jar!/:na]
at com.amazonaws.xray.strategy.sampling.pollers.RulePoller.pollRule(RulePoller.java:65) ~[aws-xray-recorder-sdk-core-2.4.0.jar!/:na]
at com.amazonaws.xray.strategy.sampling.pollers.RulePoller.lambda$start[=12=](RulePoller.java:46) ~[aws-xray-recorder-sdk-core-2.4.0.jar!/:na]
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) ~[na:na]
at java.base/java.util.concurrent.FutureTask.runAndReset(Unknown Source) ~[na:na]
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source) ~[na:na]
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) ~[na:na]
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) ~[na:na]
at java.base/java.lang.Thread.run(Unknown Source) ~[na:na]
...
这意味着 AWS SDK 没有像文档中建议的那样获取这个环境变量,而只是使用默认值 127.0.0.1:2000
。
然后我深入研究了 SDK 代码,以了解它是如何检索这个变量的,并发现 运行 它使用 System.getenv("AWS_XRAY_DAEMON_ADDRESS")
的代码如下所示:
/**
* Environment variable key used to override the address to which UDP packets will be emitted. Valid values are of the form `ip_address:port`. Takes precedence over any system property,
* constructor value, or setter value used.
*/
public static final String DAEMON_ADDRESS_ENVIRONMENT_VARIABLE_KEY = "AWS_XRAY_DAEMON_ADDRESS";
/**
* System property key used to override the address to which UDP packets will be emitted. Valid values are of the form `ip_address:port`. Takes precedence over any constructor or setter value
* used.
*/
public static final String DAEMON_ADDRESS_SYSTEM_PROPERTY_KEY = "com.amazonaws.xray.emitters.daemonAddress";
public DaemonConfiguration() {
String environmentAddress = System.getenv(DAEMON_ADDRESS_ENVIRONMENT_VARIABLE_KEY);
String systemAddress = System.getProperty(DAEMON_ADDRESS_SYSTEM_PROPERTY_KEY);
if (setUDPAndTCPAddress(environmentAddress)) {
logger.info(String.format("Environment variable %s is set. Emitting to daemon on address %s.", DAEMON_ADDRESS_ENVIRONMENT_VARIABLE_KEY, getUDPAddress()));
} else if (setUDPAndTCPAddress(systemAddress)) {
logger.info(String.format("System property %s is set. Emitting to daemon on address %s.", DAEMON_ADDRESS_SYSTEM_PROPERTY_KEY, getUDPAddress()));
}
}
所以我想,也许我没有正确设置环境变量?于是我在服务启动时添加了一条获取环境变量的日志,发现JVM确实可以找到这个值:
代码:
System.out.println("System.getenv(\"AWS_XRAY_DAEMON_ADDRESS\")" + " = " + System.getenv("AWS_XRAY_DAEMON_ADDRESS"))
输出:
System.getenv("AWS_XRAY_DAEMON_ADDRESS") = aws-xray-daemon.default
据我所知,这段代码与 AWS SDK 应该 运行 完全匹配,但它似乎从未被执行过,即使是,它也没有相同的结果就像我用我的日志测试过的那样。
运行 在本地,我无法重现此问题,因为它会获取我从本地环境变量中提供的主机。我还确认,当 运行 在本地使用断点时,可以到达上面粘贴的 AWS SDK 代码。
有什么想法吗?
Gradle 片段:
ext {
...
springCloudVersion = "Greenwich.RELEASE"
awsCoreVersion = '1.11.739'
awsXrayVersion = '2.4.0'
...
}
dependencyManagement {
imports {
mavenBom "org.springframework.cloud:spring-cloud-dependencies:${springCloudVersion}"
mavenBom "com.amazonaws:aws-java-sdk-bom:${awsCoreVersion}"
mavenBom "com.amazonaws:aws-xray-recorder-sdk-bom:${awsXrayVersion}"
}
}
dependencies {
...
implementation "com.amazonaws:aws-java-sdk-core"
implementation "com.amazonaws:aws-xray-recorder-sdk-core"
implementation "com.amazonaws:aws-xray-recorder-sdk-aws-sdk"
implementation "com.amazonaws:aws-xray-recorder-sdk-spring"
implementation "com.amazonaws:aws-xray-recorder-sdk-apache-http"
implementation "com.amazonaws:aws-xray-recorder-sdk-sql-postgres"
implementation 'org.springframework.boot:spring-boot-starter-web'
implementation 'org.springframework.boot:spring-boot-starter'
implementation 'org.springframework.boot:spring-boot-starter-data-jpa'
implementation 'org.springframework.boot:spring-boot-starter-security'
...
}
其他信息:
- 运行 在 Spring Boot v2.2.1
- OpenJDK v11.0.4
- Gradle v6.0.1
其他尝试:
- 我尝试通过 Dockerfile
设置环境变量。这有相同的结果。
原来我链接的 blog post 不是一个好博客 post。在示例中,他们没有指定主机的端口:
env:
- name: AWS_XRAY_DAEMON_ADDRESS
value: xray-service.default
更改环境变量以包含端口修复了问题:
env:
- name: AWS_XRAY_DAEMON_ADDRESS
value: xray-service.default:2000