在不忽略密钥的情况下声明 mrjob 映射器
Declare mrjob mapper without ignoring key
我想用 mrjob 声明一个映射器函数。因为我的mapper函数需要引用一些常量来做一些计算,所以我决定把这些常量放到mapper中的Key中(还有其他方法吗?)。我在 this site 上阅读了 mrjob 教程,但所有示例都忽略了密钥。例如:
class MRWordFrequencyCount(MRJob):
def mapper(self, _, line):
yield "chars", len(line)
yield "words", len(line.split())
yield "lines", 1
def reducer(self, key, values):
yield key, sum(values)
基本上,我想要这样的东西:
def mapper(self, (constant1,constant2,constant3,constant4,constant5), line):
My calculation goes here
请建议我怎么做。谢谢
您可以在 __init__
中设置常量
from mrjob.job import MRJob
class MRWordFrequencyCount(MRJob):
def mapper(self, key, line):
yield "chars", len(line)
yield "words", len(line.split())
yield "lines", 1
yield "Constant",self.constant
def reducer(self, key, values):
yield key, sum(values)
def __init__(self,*args,**kwargs):
super(MRWordFrequencyCount, self).__init__(*args, **kwargs)
self.constant = 10
if __name__ == '__main__':
MRWordFrequencyCount.run()
输出:
"Constant" 10
"chars" 12
"lines" 1
"words" 2
或者,您可以使用 RawProtocol
from mrjob.job import MRJob
import mrjob
class MRWordFrequencyCount(MRJob):
INPUT_PROTOCOL = mrjob.protocol.RawProtocol
def mapper(self, key, line):
yield "constant", key
yield "chars", len(line)
yield "words", len(line.split())
yield "lines", 1
def reducer(self, key, values):
if str(key) != "constant":
yield key, sum(values)
else:
yield "constant",list(values)
if __name__ == '__main__':
MRWordFrequencyCount.run()
如果输入是:
constant1,constant2,constant3 The quick brown fox jumps over the lazy dog
输出:
"chars" 43
"constant" ["constant1,constant2,constant3"]
"lines" 1
"words" 9
我想用 mrjob 声明一个映射器函数。因为我的mapper函数需要引用一些常量来做一些计算,所以我决定把这些常量放到mapper中的Key中(还有其他方法吗?)。我在 this site 上阅读了 mrjob 教程,但所有示例都忽略了密钥。例如:
class MRWordFrequencyCount(MRJob):
def mapper(self, _, line):
yield "chars", len(line)
yield "words", len(line.split())
yield "lines", 1
def reducer(self, key, values):
yield key, sum(values)
基本上,我想要这样的东西:
def mapper(self, (constant1,constant2,constant3,constant4,constant5), line):
My calculation goes here
请建议我怎么做。谢谢
您可以在 __init__
中设置常量from mrjob.job import MRJob
class MRWordFrequencyCount(MRJob):
def mapper(self, key, line):
yield "chars", len(line)
yield "words", len(line.split())
yield "lines", 1
yield "Constant",self.constant
def reducer(self, key, values):
yield key, sum(values)
def __init__(self,*args,**kwargs):
super(MRWordFrequencyCount, self).__init__(*args, **kwargs)
self.constant = 10
if __name__ == '__main__':
MRWordFrequencyCount.run()
输出:
"Constant" 10
"chars" 12
"lines" 1
"words" 2
或者,您可以使用 RawProtocol
from mrjob.job import MRJob
import mrjob
class MRWordFrequencyCount(MRJob):
INPUT_PROTOCOL = mrjob.protocol.RawProtocol
def mapper(self, key, line):
yield "constant", key
yield "chars", len(line)
yield "words", len(line.split())
yield "lines", 1
def reducer(self, key, values):
if str(key) != "constant":
yield key, sum(values)
else:
yield "constant",list(values)
if __name__ == '__main__':
MRWordFrequencyCount.run()
如果输入是:
constant1,constant2,constant3 The quick brown fox jumps over the lazy dog
输出:
"chars" 43
"constant" ["constant1,constant2,constant3"]
"lines" 1
"words" 9