编写一个计算文件中单词首字母出现频率的作业。因此,如果有三个以 "c" 开头的单词,答案将是 "c 3"
Write a job that counts the frequencies of word first letters in a file. So if there are three words starting with "c" answer would be "c 3"
我有以下代码并获取了字数,但获取了所有单词的首字母频率我不明白该怎么做。如果文件中有三个以 C 开头的单词,我希望结果为“C 3”。我不需要区分大小写,所以 'a' 和 'A' 将被计算相同。
from mrjob.job import MRJob
class Job(MRJob):
def mapper(self,Key, value):
for char in value.strip().split():
yield char, 1
def reducer(self, Key, values):
yield Key, sum(values)
if __name__ == '__main__':
Job.run()
您可以更改 https://pypi.org/project/mrjob/ 上的默认示例:
"""The classic MapReduce job: count the frequency of words.
"""
from mrjob.job import MRJob
import re
WORD_RE = re.compile(r"[\w']+")
class MRWordFreqCount(MRJob):
def mapper(self, _, line):
for word in WORD_RE.findall(line):
yield (word.lower(), 1)
def combiner(self, word, counts):
yield (word, sum(counts))
def reducer(self, word, counts):
yield (word, sum(counts))
将完整(小写)单词完成此操作
"""The changed MapReduce job: count the frequency of words
starting with the same (case insensitive) letter."""
from mrjob.job import MRJob
import re
WORD_RE = re.compile(r"[\w']+")
class MyWordCount(MRJob):
def mapper(self, _, line):
for word in WORD_RE.findall(line):
yield (word[0].lower(), 1) # use the 1st letter, lowercased
def combiner(self, word, counts):
yield (word, sum(counts))
def reducer(self, word, counts):
yield (word, sum(counts))
if __name__ == '__main__':
MyWordCount.run()
将其保存为 my_word_count.py
并像这样启动它:
python my_word_count README.rst > counts.txt
然后在counts.txt
中找到结果
我有以下代码并获取了字数,但获取了所有单词的首字母频率我不明白该怎么做。如果文件中有三个以 C 开头的单词,我希望结果为“C 3”。我不需要区分大小写,所以 'a' 和 'A' 将被计算相同。
from mrjob.job import MRJob
class Job(MRJob):
def mapper(self,Key, value):
for char in value.strip().split():
yield char, 1
def reducer(self, Key, values):
yield Key, sum(values)
if __name__ == '__main__':
Job.run()
您可以更改 https://pypi.org/project/mrjob/ 上的默认示例:
"""The classic MapReduce job: count the frequency of words. """ from mrjob.job import MRJob import re WORD_RE = re.compile(r"[\w']+") class MRWordFreqCount(MRJob): def mapper(self, _, line): for word in WORD_RE.findall(line): yield (word.lower(), 1) def combiner(self, word, counts): yield (word, sum(counts)) def reducer(self, word, counts): yield (word, sum(counts))
将完整(小写)单词完成此操作
"""The changed MapReduce job: count the frequency of words
starting with the same (case insensitive) letter."""
from mrjob.job import MRJob
import re
WORD_RE = re.compile(r"[\w']+")
class MyWordCount(MRJob):
def mapper(self, _, line):
for word in WORD_RE.findall(line):
yield (word[0].lower(), 1) # use the 1st letter, lowercased
def combiner(self, word, counts):
yield (word, sum(counts))
def reducer(self, word, counts):
yield (word, sum(counts))
if __name__ == '__main__':
MyWordCount.run()
将其保存为 my_word_count.py
并像这样启动它:
python my_word_count README.rst > counts.txt
然后在counts.txt