使用 MapReduce 查找单词的最大长度

Question

我需要使用 MapReduce 从 txt 文件中找到所有最长的 word/words。我已经为映射器和缩减器编写了以下代码，但它将 len(words) 的整个字典显示为键，将单词显示为值。我在编写代码以仅显示最大长度和相应单词的结果时需要帮助。以下是我的代码：

"""mapper.py"""
import sys
> for line in sys.stdin:
>   for word in line.strip().split():
>      print ('%s\t%s' % (len(word), word))



"""reducer.py"""

> import sys results={} for line in sys.stdin:
>     index, value = line.strip().split('\t')
>     if index not in results :
>         results[index] = value
>     else :
>         results[index] += ' '
>         results[index] += value

***** 我只是停留在这部分继续编码以获得具有相应单词的 max(key)

输入文件：和平如何开始？和平从说对不起开始，和平从不伤害他人开始，和平始于诚实、信任和奉献，和平始于合作与尊重。世界和平由我做起！

预期输出：最长的单词有 11 个字符。文字是：敬业合作

Answer 1

我不确定您使用 stdin 做什么或为什么要导入 sys。此外，示例输入文件似乎不是 csv 格式，而只是一个简单的文本文件。据我了解你的问题，你想要读取一个输入文件，测量每个单词的长度并报告最大单词的长度，并列出满足此条件的单词。考虑到这一点，这就是我将如何进行：

inputFile = r'sampleMapperText.txt'
with open(inputFile, 'r') as f:
    reslt = dict()  #keys = word lengths, values = words of key length
    text = f.read().split('\n')
    for line in text:
        words = line.split()
        for w in words:
            wdlist = reslt.pop(len(w), [])
            wdlist.append(w)
            reslt[len(w)] = wdlist
    maxLen = max(list(reslt.keys()))
    print(f"Max Word Length = {maxLen}, Longest words = {', '.join(reslt[maxLen])}")

运行此代码生成：

Max Word Length = 12, Longest words = dedications,

如果您坚持将进程分成两个单独的文件。假设这两个文件在同一目录中，我将按如下方式进行：

reducer.py 文件的内容为：

# reducer.py 
def getData(filepath: str) -> list([str]):
    with open(filepath, 'r') as f:
        text = f.read().split('\n')
    return text

mapper.py 文件的内容为：

# mapper.py
from reducer import getData

def mapData(text:list(str)):
    reslt = dict()  #keys = word lengths, values = words of key length
    for line in text:
        words = line.split()
        for w in words:
            wdlist = reslt.pop(len(w), [])
            wdlist.append(w)
            reslt[len(w)] = wdlist
    maxLen = max(list(reslt.keys()))
    print(f"Max Word Length = {maxLen}, Longest words = {', '.join(reslt[maxLen])}")     

inputFile = r'sampleMapperText.txt'
mapData(getData(inputFile))

使用 MapReduce 查找单词的最大长度

Finding the max _length of word using MapReduce

python

dictionary

mapreduce

list

max