我的代码正在输出一个值元组,我希望它是成对的,我需要帮助来理解如何修改它
my code is outputting a tuple of values and I would like it to be in individual pairs, i need help to understand how to modify it
def mapper(self, _, line):
stop_words = set(["to", "a", "an", "the", "for", "in", "on", "of", "at", "over", "with", "after", "and", "from", "new", "us", "by", "as", "man", "up", "says", "in", "out", "is", "be", "are", "not", "pm", "am", "off", "more", "less", "no", "how"])
(date,words) = line.strip().split(",")
word_list = words.split()
clean_words = [word for word in word_list if word not in stop_words]
clean_words.sort()
yield (date[0:4],clean_words)
这是在 MRJob 映射器中。
当前输出看起来像
"2003" ["word 1","word 2", "word 3", "word 4"]
"2004" ["word 1","word 2", "word 3", "word 4"]
我想要的是:
"2003" "Word 1"
"2003" "Word 2"
"2004" "Word 3"
"2004" "Word 4"
一旦像这样输出我就可以发送到reducer来计算年份和前3个词
使用循环分别生成每个单词。
def mapper(self, _, line):
stop_words = set(["to", "a", "an", "the", "for", "in", "on", "of", "at", "over", "with", "after", "and", "from", "new", "us", "by", "as", "man", "up", "says", "in", "out", "is", "be", "are", "not", "pm", "am", "off", "more", "less", "no", "how"])
(date,words) = line.strip().split(",")
word_list = words.split()
clean_words = [word for word in word_list if word not in stop_words]
clean_words.sort()
for word in clean_words:
yield (date[0:4],word)
def mapper(self, _, line):
stop_words = set(["to", "a", "an", "the", "for", "in", "on", "of", "at", "over", "with", "after", "and", "from", "new", "us", "by", "as", "man", "up", "says", "in", "out", "is", "be", "are", "not", "pm", "am", "off", "more", "less", "no", "how"])
(date,words) = line.strip().split(",")
word_list = words.split()
clean_words = [word for word in word_list if word not in stop_words]
clean_words.sort()
yield (date[0:4],clean_words)
这是在 MRJob 映射器中。 当前输出看起来像
"2003" ["word 1","word 2", "word 3", "word 4"]
"2004" ["word 1","word 2", "word 3", "word 4"]
我想要的是:
"2003" "Word 1"
"2003" "Word 2"
"2004" "Word 3"
"2004" "Word 4"
一旦像这样输出我就可以发送到reducer来计算年份和前3个词
使用循环分别生成每个单词。
def mapper(self, _, line):
stop_words = set(["to", "a", "an", "the", "for", "in", "on", "of", "at", "over", "with", "after", "and", "from", "new", "us", "by", "as", "man", "up", "says", "in", "out", "is", "be", "are", "not", "pm", "am", "off", "more", "less", "no", "how"])
(date,words) = line.strip().split(",")
word_list = words.split()
clean_words = [word for word in word_list if word not in stop_words]
clean_words.sort()
for word in clean_words:
yield (date[0:4],word)