Hadoop 发现 2 个意外参数

Hadoop Found 2 unexpected arguments

我在 运行 Hadoop windows 上,我正在尝试提交 MRJob,但返回错误 Found 2 unexpected arguments on the command line

(cmtle) d:\>python norad_counts.py -r hadoop --hadoop-streaming-jar C:\hadoop-3.3.0\share\hadoop\tools\lib\hadoop-streaming-3.3.0.jar all_files.txt
No configs found; falling back on auto-configuration
No configs specified for hadoop runner
Looking for hadoop binary in C:\hadoop-3.3.0\bin\bin...
Looking for hadoop binary in $PATH...
Found hadoop binary: C:\hadoop-3.3.0\bin\hadoop.CMD
Using Hadoop version 3.3.0
Creating temp directory C:\Users\mille\AppData\Local\Temp\norad_counts.mille.20210318.083636.028559
uploading working dir files to hdfs:///user/mille/tmp/mrjob/norad_counts.mille.20210318.083636.028559/files/wd...
Copying other local files to hdfs:///user/mille/tmp/mrjob/norad_counts.mille.20210318.083636.028559/files/
Running step 1 of 1...
  Found 2 unexpected arguments on the command line [hdfs:///user/mille/tmp/mrjob/norad_counts.mille.20210318.083636.028559/files/wd/norad_counts.py#norad_counts.py, hdfs:///user/mille/tmp/mrjob/norad_counts.mille.20210318.083636.028559/files/wd/setup-wrapper.sh#setup-wrapper.sh]
  Try -help for more information
  Streaming Command Failed!
Attempting to fetch counters from logs...
Can't fetch history log; missing job ID
No counters found
Scanning logs for probable cause of failure...
Can't fetch history log; missing job ID
Can't fetch task logs; missing application ID
Step 1 of 1 failed: Command '['C:\hadoop-3.3.0\bin\hadoop.CMD', 'jar', 'C:\hadoop-3.3.0\share\hadoop\tools\lib\hadoop-streaming-3.3.0.jar', '-files', 'hdfs:///user/mille/tmp/mrjob/norad_counts.mille.20210318.083636.028559/files/wd/mrjob.zip#mrjob.zip,hdfs:///user/mille/tmp/mrjob/norad_counts.mille.20210318.083636.028559/files/wd/norad_counts.py#norad_counts.py,hdfs:///user/mille/tmp/mrjob/norad_counts.mille.20210318.083636.028559/files/wd/setup-wrapper.sh#setup-wrapper.sh', '-input', 'hdfs:///user/mille/tmp/mrjob/norad_counts.mille.20210318.083636.028559/files/all_files.txt', '-output', 'hdfs:///user/mille/tmp/mrjob/norad_counts.mille.20210318.083636.028559/output', '-mapper', '/bin/sh -ex setup-wrapper.sh python3 norad_counts.py --step-num=0 --mapper', '-combiner', '/bin/sh -ex setup-wrapper.sh python3 norad_counts.py --step-num=0 --combiner', '-reducer', '/bin/sh -ex setup-wrapper.sh python3 norad_counts.py --step-num=0 --reducer']' returned non-zero exit status 1.

这里是norad_count.py的内容:

from mrjob.job import MRJob, JSONProtocol
import pandas as pd

class MRNoradCounts(MRJob):
    
    def mapper(self, _, file_path):
        try:
            df = pd.read_csv(file_path, compression='gzip', low_memory=False)
            df = df[(df.MEAN_MOTION > 11.25) & (df.ECCENTRICITY < 0.25)]
        except:
            raise Exception(f'Failed to open {file_path}') 
        #print(f'File: {file_path}')
        for norad in df.NORAD_CAT_ID.to_list():
            yield norad, 1
            
    def combiner(self, norad, counts):
        yield norad, sum(counts)
        
    def reducer(self, norad, counts):
        yield norad, sum(counts)
        
if __name__ == "__main__":
    MRNoradCounts.run()

我通过重新安装 Java JDK 解决了我的问题。我最初将其安装到 C:\Program Files\Java,但根据其他一些说明将其移动到 C:\Java。我认为更新环境变量就足够了,但显然,事实并非如此。所以我卸载了 Java 并重新安装了它。这次 C:\Java 解决了我的问题。