Python MapReduce 长度最长的单词并显示最长的单词

Question

我正在尝试将 MapReduce 原则与 Python 结合使用，首先将文本中的单词列表与其字符数映射，然后减少以显示最长的单词及其字符数。

See visual explanation 输入是一个包含 3 个句子的简单 text.txt 文件。请参阅下面我的映射器：

#MAPPER
#!/usr/bin/python
import sys
for line in sys.stdin:
    for word in line.strip().split():
        print(str(len(word)) + '\t' + word)

感谢您对 Reducer 的帮助。要求的结果是“此文本中最长的单词是 xxxxxxxxx 和 yyyyyyyy，带有 xx 个字符”

Answer 1

这就是你所要求的。这只保留前 5 个单词，这使得排序更容易。循环中的“print”仅用于调试，可以去掉

import sys
top5 = []
for line in sys.stdin:
    for word in line.strip().split():
        top5.append( (len(word),word) )
        top5.sort( reverse=True )
        top5 = top5[:5]
    print(top5)
print( "Top 5 words by length are:" )
print(top5)

Python MapReduce 长度最长的单词并显示最长的单词

Python MapReduce length longest word and show longest word(s)

python

hadoop

mapreduce