Python MapReduce 如何添加条件语句
Python MapReduce How do i add a conditional statement
我是 MapReduce 的新手,我正在尝试查找 MovieLens 100k 数据集中电影的平均电影评论。我有一个工作程序可以找到每部电影的平均评论,但我想要的是只对评论超过 100 的电影执行此操作。如何添加条件语句来执行此操作?
from mrjob.job import MRJob
class PopularMovieAvgReview(MRJob):
def mapper(self, key, line):
(userID, movieID, rating, timestamp) = line.split('\t')
yield movieID, float(rating)
def reducer(self, movieID, rating):
total = 0
numElements = 0
for x in rating:
total += x
numElements += 1
yield movieID, total / numElements
if __name__ == '__main__':
PopularMovieAvgReview.run()
如果我没理解错的话,你想根据给出的评分数量限制输出
def reducer(self, movieID, rating):
total = 0
numElements = 0
for x in rating:
total += x
numElements += 1
if numElements > 100:
yield movieID, total / numElements
或者,您可以使用 PySpark 聚合,然后过滤评分量
我是 MapReduce 的新手,我正在尝试查找 MovieLens 100k 数据集中电影的平均电影评论。我有一个工作程序可以找到每部电影的平均评论,但我想要的是只对评论超过 100 的电影执行此操作。如何添加条件语句来执行此操作?
from mrjob.job import MRJob
class PopularMovieAvgReview(MRJob):
def mapper(self, key, line):
(userID, movieID, rating, timestamp) = line.split('\t')
yield movieID, float(rating)
def reducer(self, movieID, rating):
total = 0
numElements = 0
for x in rating:
total += x
numElements += 1
yield movieID, total / numElements
if __name__ == '__main__':
PopularMovieAvgReview.run()
如果我没理解错的话,你想根据给出的评分数量限制输出
def reducer(self, movieID, rating):
total = 0
numElements = 0
for x in rating:
total += x
numElements += 1
if numElements > 100:
yield movieID, total / numElements
或者,您可以使用 PySpark 聚合,然后过滤评分量