在 AWS EMR 上为 s3distcp 配置日志记录
Configure logging on AWS EMR for s3distcp
我想更改 s3distcp
和其他 hadoop 命令以仅记录 WARN
消息或更糟,而目前它记录 INFO
和更糟。
如何在 AWS EMR 集群的头节点上配置它?
这是我试图隐藏的输出示例:
$ hadoop jar ~hadoop/lib/emr-s3distcp-1.0.jar --src /user/myusername/test --dest s3://some-bucket/myusername/data/test
6/06/01 17:18:03 INFO s3distcp.S3DistCp: Running with args: --src /user/myusername/test --dest s3://some-bucket/myusername/data/test
16/06/01 17:18:03 INFO s3distcp.S3DistCp: S3DistCp args: --src /user/myusername/test --dest s3://some-bucket/myusername/data/test
16/06/01 17:18:06 INFO s3distcp.S3DistCp: Using output path 'hdfs:/tmp/97139b69-ea86-400e-9ce4-f0718ff2b669/output'
16/06/01 17:18:06 INFO s3distcp.S3DistCp: GET http://x.x.x.x/latest/meta-data/placement/availability-zone result: us-east-1b
16/06/01 17:18:06 INFO s3distcp.FileInfoListing: Opening new file: hdfs:/tmp/97139b69-ea86-400e-9ce4-f0718ff2b669/files/1
16/06/01 17:18:06 INFO s3distcp.S3DistCp: Created 1 files to copy 88 files
16/06/01 17:18:06 INFO s3distcp.S3DistCp: Reducer number: 15
16/06/01 17:18:06 INFO client.RMProxy: Connecting to ResourceManager at /x.x.x.x:9022
16/06/01 17:18:07 INFO input.FileInputFormat: Total input paths to process : 1
16/06/01 17:18:07 INFO mapreduce.JobSubmitter: number of splits:1
16/06/01 17:18:07 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1464201102672_0019
16/06/01 17:18:07 INFO impl.YarnClientImpl: Submitted application application_1464201102672_0019
16/06/01 17:18:07 INFO mapreduce.Job: The url to track the job: http://x.x.x.x:9046/proxy/application_1464201102672_0019/
16/06/01 17:18:07 INFO mapreduce.Job: Running job: job_1464201102672_0019
16/06/01 17:18:13 INFO mapreduce.Job: Job job_1464201102672_0019 running in uber mode : false
16/06/01 17:18:13 INFO mapreduce.Job: map 0% reduce 0%
16/06/01 17:18:19 INFO mapreduce.Job: map 100% reduce 0%
16/06/01 17:18:30 INFO mapreduce.Job: map 100% reduce 5%
16/06/01 17:18:31 INFO mapreduce.Job: map 100% reduce 10%
16/06/01 17:18:32 INFO mapreduce.Job: map 100% reduce 22%
16/06/01 17:18:33 INFO mapreduce.Job: map 100% reduce 23%
16/06/01 17:18:34 INFO mapreduce.Job: map 100% reduce 33%
16/06/01 17:18:35 INFO mapreduce.Job: map 100% reduce 40%
16/06/01 17:18:36 INFO mapreduce.Job: map 100% reduce 50%
16/06/01 17:18:37 INFO mapreduce.Job: map 100% reduce 57%
16/06/01 17:18:38 INFO mapreduce.Job: map 100% reduce 77%
16/06/01 17:18:39 INFO mapreduce.Job: map 100% reduce 85%
16/06/01 17:18:40 INFO mapreduce.Job: map 100% reduce 90%
16/06/01 17:18:41 INFO mapreduce.Job: map 100% reduce 95%
16/06/01 17:18:42 INFO mapreduce.Job: map 100% reduce 98%
16/06/01 17:18:43 INFO mapreduce.Job: map 100% reduce 100%
16/06/01 17:18:43 INFO mapreduce.Job: Job job_1464201102672_0019 completed successfully
16/06/01 17:18:43 INFO mapreduce.Job: Counters: 54
File System Counters
FILE: Number of bytes read=5447
FILE: Number of bytes written=1640535
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=113570708
HDFS: Number of bytes written=56776676
HDFS: Number of read operations=401
HDFS: Number of large read operations=0
HDFS: Number of write operations=206
S3: Number of bytes read=0
S3: Number of bytes written=0
S3: Number of read operations=0
S3: Number of large read operations=0
S3: Number of write operations=0
Job Counters
Launched map tasks=1
Launched reduce tasks=15
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=166005
Total time spent by all reduces in occupied slots (ms)=18351000
Total time spent by all map tasks (ms)=3689
Total time spent by all reduce tasks (ms)=203900
Total vcore-seconds taken by all map tasks=3689
Total vcore-seconds taken by all reduce tasks=203900
Total megabyte-seconds taken by all map tasks=5312160
Total megabyte-seconds taken by all reduce tasks=587232000
Map-Reduce Framework
Map input records=88
Map output records=88
Map output bytes=20500
Map output materialized bytes=5387
Input split bytes=138
Combine input records=0
Combine output records=0
Reduce input groups=88
Reduce shuffle bytes=5387
Reduce input records=88
Reduce output records=0
Spilled Records=176
Shuffled Maps =15
Failed Shuffles=0
Merged Map outputs=15
GC time elapsed (ms)=2658
CPU time spent (ms)=98620
Physical memory (bytes) snapshot=5777489920
Virtual memory (bytes) snapshot=50741022720
Total committed heap usage (bytes)=9051308032
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=17218
File Output Format Counters
Bytes Written=0
16/06/01 17:18:43 INFO s3distcp.S3DistCp: Try to recursively delete hdfs:/tmp/97139b69-ea86-400e-9ce4-f0718ff2b669/tempspace
似乎最好的方法是更改 HADOOP_ROOT_LOGGER
环境变量。您可以在当前会话的 linux 命令行中执行此操作 运行 或者您可以将其添加到 hadoop-env.sh 脚本中(如果应该始终如此)。
export HADOOP_ROOT_LOGGER="WARN,console"
WARN
指定只记录消息 WARN
或更糟的消息,console
指定消息也应该打印到命令行。
注意:如果您想修改 hadoop-env.sh
文件,您可以在 /etc/hadoop/conf/hadoop-env.sh
中找到它,或者对于较旧的 EMR 集群 /home/hadoop/conf/hadoop-env.sh
我想更改 s3distcp
和其他 hadoop 命令以仅记录 WARN
消息或更糟,而目前它记录 INFO
和更糟。
如何在 AWS EMR 集群的头节点上配置它?
这是我试图隐藏的输出示例:
$ hadoop jar ~hadoop/lib/emr-s3distcp-1.0.jar --src /user/myusername/test --dest s3://some-bucket/myusername/data/test
6/06/01 17:18:03 INFO s3distcp.S3DistCp: Running with args: --src /user/myusername/test --dest s3://some-bucket/myusername/data/test
16/06/01 17:18:03 INFO s3distcp.S3DistCp: S3DistCp args: --src /user/myusername/test --dest s3://some-bucket/myusername/data/test
16/06/01 17:18:06 INFO s3distcp.S3DistCp: Using output path 'hdfs:/tmp/97139b69-ea86-400e-9ce4-f0718ff2b669/output'
16/06/01 17:18:06 INFO s3distcp.S3DistCp: GET http://x.x.x.x/latest/meta-data/placement/availability-zone result: us-east-1b
16/06/01 17:18:06 INFO s3distcp.FileInfoListing: Opening new file: hdfs:/tmp/97139b69-ea86-400e-9ce4-f0718ff2b669/files/1
16/06/01 17:18:06 INFO s3distcp.S3DistCp: Created 1 files to copy 88 files
16/06/01 17:18:06 INFO s3distcp.S3DistCp: Reducer number: 15
16/06/01 17:18:06 INFO client.RMProxy: Connecting to ResourceManager at /x.x.x.x:9022
16/06/01 17:18:07 INFO input.FileInputFormat: Total input paths to process : 1
16/06/01 17:18:07 INFO mapreduce.JobSubmitter: number of splits:1
16/06/01 17:18:07 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1464201102672_0019
16/06/01 17:18:07 INFO impl.YarnClientImpl: Submitted application application_1464201102672_0019
16/06/01 17:18:07 INFO mapreduce.Job: The url to track the job: http://x.x.x.x:9046/proxy/application_1464201102672_0019/
16/06/01 17:18:07 INFO mapreduce.Job: Running job: job_1464201102672_0019
16/06/01 17:18:13 INFO mapreduce.Job: Job job_1464201102672_0019 running in uber mode : false
16/06/01 17:18:13 INFO mapreduce.Job: map 0% reduce 0%
16/06/01 17:18:19 INFO mapreduce.Job: map 100% reduce 0%
16/06/01 17:18:30 INFO mapreduce.Job: map 100% reduce 5%
16/06/01 17:18:31 INFO mapreduce.Job: map 100% reduce 10%
16/06/01 17:18:32 INFO mapreduce.Job: map 100% reduce 22%
16/06/01 17:18:33 INFO mapreduce.Job: map 100% reduce 23%
16/06/01 17:18:34 INFO mapreduce.Job: map 100% reduce 33%
16/06/01 17:18:35 INFO mapreduce.Job: map 100% reduce 40%
16/06/01 17:18:36 INFO mapreduce.Job: map 100% reduce 50%
16/06/01 17:18:37 INFO mapreduce.Job: map 100% reduce 57%
16/06/01 17:18:38 INFO mapreduce.Job: map 100% reduce 77%
16/06/01 17:18:39 INFO mapreduce.Job: map 100% reduce 85%
16/06/01 17:18:40 INFO mapreduce.Job: map 100% reduce 90%
16/06/01 17:18:41 INFO mapreduce.Job: map 100% reduce 95%
16/06/01 17:18:42 INFO mapreduce.Job: map 100% reduce 98%
16/06/01 17:18:43 INFO mapreduce.Job: map 100% reduce 100%
16/06/01 17:18:43 INFO mapreduce.Job: Job job_1464201102672_0019 completed successfully
16/06/01 17:18:43 INFO mapreduce.Job: Counters: 54
File System Counters
FILE: Number of bytes read=5447
FILE: Number of bytes written=1640535
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=113570708
HDFS: Number of bytes written=56776676
HDFS: Number of read operations=401
HDFS: Number of large read operations=0
HDFS: Number of write operations=206
S3: Number of bytes read=0
S3: Number of bytes written=0
S3: Number of read operations=0
S3: Number of large read operations=0
S3: Number of write operations=0
Job Counters
Launched map tasks=1
Launched reduce tasks=15
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=166005
Total time spent by all reduces in occupied slots (ms)=18351000
Total time spent by all map tasks (ms)=3689
Total time spent by all reduce tasks (ms)=203900
Total vcore-seconds taken by all map tasks=3689
Total vcore-seconds taken by all reduce tasks=203900
Total megabyte-seconds taken by all map tasks=5312160
Total megabyte-seconds taken by all reduce tasks=587232000
Map-Reduce Framework
Map input records=88
Map output records=88
Map output bytes=20500
Map output materialized bytes=5387
Input split bytes=138
Combine input records=0
Combine output records=0
Reduce input groups=88
Reduce shuffle bytes=5387
Reduce input records=88
Reduce output records=0
Spilled Records=176
Shuffled Maps =15
Failed Shuffles=0
Merged Map outputs=15
GC time elapsed (ms)=2658
CPU time spent (ms)=98620
Physical memory (bytes) snapshot=5777489920
Virtual memory (bytes) snapshot=50741022720
Total committed heap usage (bytes)=9051308032
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=17218
File Output Format Counters
Bytes Written=0
16/06/01 17:18:43 INFO s3distcp.S3DistCp: Try to recursively delete hdfs:/tmp/97139b69-ea86-400e-9ce4-f0718ff2b669/tempspace
似乎最好的方法是更改 HADOOP_ROOT_LOGGER
环境变量。您可以在当前会话的 linux 命令行中执行此操作 运行 或者您可以将其添加到 hadoop-env.sh 脚本中(如果应该始终如此)。
export HADOOP_ROOT_LOGGER="WARN,console"
WARN
指定只记录消息 WARN
或更糟的消息,console
指定消息也应该打印到命令行。
注意:如果您想修改 hadoop-env.sh
文件,您可以在 /etc/hadoop/conf/hadoop-env.sh
中找到它,或者对于较旧的 EMR 集群 /home/hadoop/conf/hadoop-env.sh