什么是 AWSRequestMetricsFullSupport 以及如何关闭它?

What is AWSRequestMetricsFullSupport and how do I turn it off?

我正在尝试将一些数据从 Spark 数据帧保存到 S3 存储桶。这很简单:

dataframe.saveAsParquetFile("s3://kirk/my_file.parquet")

数据保存成功,但是UI却忙了很长时间。我得到了数千行:

2015-09-04 20:48:19,591 INFO  [main] amazonaws.latency (AWSRequestMetricsFullSupport.java:log(203)) - StatusCode=[200], ServiceName=[Amazon S3], AWSRequestID=[5C3211750F4FF5AB], ServiceEndpoint=[https://kirk.s3.amazonaws.com], HttpClientPoolLeasedCount=0, RequestCount=1, HttpClientPoolPendingCount=0, HttpClientPoolAvailableCount=1, ClientExecuteTime=[63.827], HttpRequestTime=[62.919], HttpClientReceiveResponseTime=[61.678], RequestSigningTime=[0.05], ResponseProcessingTime=[0.812], HttpClientSendRequestTime=[0.038],
2015-09-04 20:48:19,610 INFO  [main] amazonaws.latency (AWSRequestMetricsFullSupport.java:log(203)) - StatusCode=[204], ServiceName=[Amazon S3], AWSRequestID=[709DA41540539FE0], ServiceEndpoint=[https://kirk.s3.amazonaws.com], HttpClientPoolLeasedCount=0, RequestCount=1, HttpClientPoolPendingCount=0, HttpClientPoolAvailableCount=1, ClientExecuteTime=[18.064], HttpRequestTime=[17.959], HttpClientReceiveResponseTime=[16.703], RequestSigningTime=[0.06], ResponseProcessingTime=[0.003], HttpClientSendRequestTime=[0.046],
2015-09-04 20:48:19,664 INFO  [main] amazonaws.latency (AWSRequestMetricsFullSupport.java:log(203)) - StatusCode=[204], ServiceName=[Amazon S3], AWSRequestID=[1B1EB812E7982C7A], ServiceEndpoint=[https://kirk.s3.amazonaws.com], HttpClientPoolLeasedCount=0, RequestCount=1, HttpClientPoolPendingCount=0, HttpClientPoolAvailableCount=1, ClientExecuteTime=[54.36], HttpRequestTime=[54.26], HttpClientReceiveResponseTime=[53.006], RequestSigningTime=[0.057], ResponseProcessingTime=[0.002], HttpClientSendRequestTime=[0.034],
2015-09-04 20:48:19,675 INFO  [main] amazonaws.latency (AWSRequestMetricsFullSupport.java:log(203)) - StatusCode=[404], Exception=[com.amazonaws.services.s3.model.AmazonS3Exception: Not Found (Service: Amazon S3; Status Code: 404; Error Code: 404 Not Found; Request ID: AF6F960F3B2BF3AB), S3 Extended Request ID: CLs9xY8HAxbEAKEJC4LS1SgpqDcnHeaGocAbdsmYKwGttS64oVjFXJOe314vmb9q], ServiceName=[Amazon S3], AWSErrorCode=[404 Not Found], AWSRequestID=[AF6F960F3B2BF3AB], ServiceEndpoint=[https://kirk.s3.amazonaws.com], Exception=1, HttpClientPoolLeasedCount=0, RequestCount=1, HttpClientPoolPendingCount=0, HttpClientPoolAvailableCount=1, ClientExecuteTime=[10.111], HttpRequestTime=[10.009], HttpClientReceiveResponseTime=[8.758], RequestSigningTime=[0.043], HttpClientSendRequestTime=[0.044],
2015-09-04 20:48:19,685 INFO  [main] amazonaws.latency (AWSRequestMetricsFullSupport.java:log(203)) - StatusCode=[404], Exception=[com.amazonaws.services.s3.model.AmazonS3Exception: Not Found (Service: Amazon S3; Status Code: 404; Error Code: 404 Not Found; Request ID: F2198ACEB4B2CE72), S3 Extended Request ID: J9oWD8ncn6WgfUhHA1yqrBfzFC+N533oD/DK90eiSvQrpGH4OJUc3riG2R4oS1NU], ServiceName=[Amazon S3], AWSErrorCode=[404 Not Found], AWSRequestID=[F2198ACEB4B2CE72], ServiceEndpoint=[https://kirk.s3.amazonaws.com], Exception=1, HttpClientPoolLeasedCount=0, RequestCount=1, HttpClientPoolPendingCount=0, HttpClientPoolAvailableCount=1, ClientExecuteTime=[9.879], HttpRequestTime=[9.776], HttpClientReceiveResponseTime=[8.537], RequestSigningTime=[0.05], HttpClientSendRequestTime=[0.033],

我可以理解是否有些用户对记录 S3 操作的延迟感兴趣,但是有什么方法可以禁用来自 AWSRequestMetricsFullSupport 的所有监控和日志记录吗?

当我检查 Spark UI 时,它告诉我作业完成得相对较快,但控制台在很长一段时间内都充斥着这些消息。

相应的。 AWS SDK for Java source comment 读取:

/**
 * Start an event which will be timed. [...]
 * 
 * This feature is enabled if the system property
 * "com.amazonaws.sdk.enableRuntimeProfiling" is set, or if a
 * {@link RequestMetricCollector} is in use either at the request, web service
 * client, or AWS SDK level.
 * 
 * @param eventName
 *            - The name of the event to start
 * 
 * @see AwsSdkMetrics
 */

正如所引用的 AwsSdkMetrics Java Docs 中进一步概述的那样,您可以通过系统 属性:

禁用它

The default metric collection of the Java AWS SDK is disabled by default. To enable it, simply specify the system property "com.amazonaws.sdk.enableDefaultMetrics" when starting up the JVM. When the system property is specified, a default metric collector will be started at the AWS SDK level. The default implementation uploads the request/response metrics captured to Amazon CloudWatch using AWS credentials obtained via the DefaultAWSCredentialsProviderChain.

这似乎可以被 RequestMetricCollector 硬连线 在请求、Web 服务客户端或 AWS SDK 级别 覆盖,这可能需要响应。正在使用的 clients/frameworks 的配置调整(例如此处的 Spark):

Clients who needs to fully customize the metric collection can implement the SPI MetricCollector, and then replace the default AWS SDK implementation of the collector via setMetricCollector(MetricCollector).

到目前为止,这些功能的文档似乎有点稀疏,这里有两篇我知道的相关博客文章:

我找到的最佳解决方案是通过将 log4j 配置文件传递给 Spark 上下文来配置 Java 日志记录(即关闭)。

--driver-java-options "-Dlog4j.configuration=/home/user/log4j.properties"

其中 log4j.properties 是禁用 INFO 类型消息的 log4j 配置文件。

事实证明,在发布标签 EMR 上静音这些日志是一项相当大的挑战。在 emr-4.7.2 版本中修复了“an issue with Spark Log4j-based logging in YARN containers”。一个可行的解决方案是将这些 json 添加为配置:

[
{
  "Classification": "hadoop-log4j",
  "Properties": {
    "log4j.logger.com.amazon.ws.emr.hadoop.fs": "ERROR",
    "log4j.logger.com.amazonaws.latency": "ERROR"
  },
  "Configurations": []
}
]

并且在 emr-4.7.2 之前还有这个 json 丢弃了 'buggy' 默认的 spark 错误的 log4j:

[
{
  "Classification": "spark-defaults",
  "Properties": {
    "spark.driver.extraJavaOptions": "-XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:MaxHeapFreeRatio=70 -XX:+CMSClassUnloadingEnabled -XX:MaxPermSize=512M -XX:OnOutOfMemoryError='kill -9 %p'"
  },
  "Configurations": []
}
]