仅地图作业不是 运行。卡在 运行 工作
Map-only job is not running. Stuck at Running job
我已经通过 Apache Flume 流式传输数据并且数据已存储在我的 hdfs 文件夹中的临时文件中:user/*****/tweets/FlumeData.1643626732852.tmp
现在我正在尝试 运行 一个仅映射器的作业,它将通过 url 删除、# 标记删除、@ 删除、停用词删除等方式对作业进行预处理
但是,仅映射器作业在 运行 个作业处停止。
映射器作业代码:
hadoop jar mr-job-jars/SentimentAnalysisPreprocessingJob.jar com.hadoop.poc.sentimentAnalysis.phase1.SentimentAnalysisPreprocessingDriver /user/*****/tweets/ FlumeData.1643626732852.tmp /output
执行输出:
2022-01-31 06:16:18,151 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
2022-01-31 06:16:18,611 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
2022-01-31 06:16:18,666 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/aviparna/.staging/job_1643615018627_0004
2022-01-31 06:16:18,996 INFO input.FileInputFormat: Total input files to process : 1
2022-01-31 06:16:19,108 WARN hdfs.DataStreamer: Caught exception
java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.Thread.join(Thread.java:1252)
at java.lang.Thread.join(Thread.java:1326)
at org.apache.hadoop.hdfs.DataStreamer.closeResponder(DataStreamer.java:986)
at org.apache.hadoop.hdfs.DataStreamer.endBlock(DataStreamer.java:640)
at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:810)
2022-01-31 06:16:19,168 INFO mapreduce.JobSubmitter: number of splits:1
2022-01-31 06:16:19,449 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1643615018627_0004
2022-01-31 06:16:19,451 INFO mapreduce.JobSubmitter: Executing with tokens: []
2022-01-31 06:16:19,794 INFO conf.Configuration: resource-types.xml not found
2022-01-31 06:16:19,794 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2022-01-31 06:16:19,935 INFO impl.YarnClientImpl: Submitted application application_1643615018627_0004
2022-01-31 06:16:20,035 INFO mapreduce.Job: The url to track the job: http://ubuntu:8088/proxy/application_1643615018627_0004/
2022-01-31 06:16:20,038 INFO mapreduce.Job: Running job: job_1643615018627_0004
我需要做什么来解决这个问题?请帮忙。
此外,如需任何其他信息,请通知我。我会尽快提供的。
添加 YARN 的屏幕截图UI:
通过在 mapred-site.xml.
中将 mapreduce.framework.name
从 yarn 更改为 local 解决了我的问题
问题似乎是由于机器资源不足造成的。
同样在更改属性后,再次重启 Hadoop 服务。
我已经通过 Apache Flume 流式传输数据并且数据已存储在我的 hdfs 文件夹中的临时文件中:user/*****/tweets/FlumeData.1643626732852.tmp
现在我正在尝试 运行 一个仅映射器的作业,它将通过 url 删除、# 标记删除、@ 删除、停用词删除等方式对作业进行预处理
但是,仅映射器作业在 运行 个作业处停止。
映射器作业代码:
hadoop jar mr-job-jars/SentimentAnalysisPreprocessingJob.jar com.hadoop.poc.sentimentAnalysis.phase1.SentimentAnalysisPreprocessingDriver /user/*****/tweets/ FlumeData.1643626732852.tmp /output
执行输出:
2022-01-31 06:16:18,151 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
2022-01-31 06:16:18,611 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
2022-01-31 06:16:18,666 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/aviparna/.staging/job_1643615018627_0004
2022-01-31 06:16:18,996 INFO input.FileInputFormat: Total input files to process : 1
2022-01-31 06:16:19,108 WARN hdfs.DataStreamer: Caught exception
java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.Thread.join(Thread.java:1252)
at java.lang.Thread.join(Thread.java:1326)
at org.apache.hadoop.hdfs.DataStreamer.closeResponder(DataStreamer.java:986)
at org.apache.hadoop.hdfs.DataStreamer.endBlock(DataStreamer.java:640)
at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:810)
2022-01-31 06:16:19,168 INFO mapreduce.JobSubmitter: number of splits:1
2022-01-31 06:16:19,449 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1643615018627_0004
2022-01-31 06:16:19,451 INFO mapreduce.JobSubmitter: Executing with tokens: []
2022-01-31 06:16:19,794 INFO conf.Configuration: resource-types.xml not found
2022-01-31 06:16:19,794 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2022-01-31 06:16:19,935 INFO impl.YarnClientImpl: Submitted application application_1643615018627_0004
2022-01-31 06:16:20,035 INFO mapreduce.Job: The url to track the job: http://ubuntu:8088/proxy/application_1643615018627_0004/
2022-01-31 06:16:20,038 INFO mapreduce.Job: Running job: job_1643615018627_0004
我需要做什么来解决这个问题?请帮忙。 此外,如需任何其他信息,请通知我。我会尽快提供的。
添加 YARN 的屏幕截图UI:
通过在 mapred-site.xml.
中将mapreduce.framework.name
从 yarn 更改为 local 解决了我的问题
问题似乎是由于机器资源不足造成的。
同样在更改属性后,再次重启 Hadoop 服务。