火花减少和地图问题
spark reduce and map issue
我正在 Spark 中做一个小实验,但遇到了麻烦。
wordCounts is : [('rat', 2), ('elephant', 1), ('cat', 2)]
# TODO: Replace <FILL IN> with appropriate code
from operator import add
totalCount = (wordCounts
.map(lambda x: (x,1)) <==== something wrong with this line maybe
.reduce(sum)) <====omething wrong with this line maybe
average = totalCount / float(wordsRDD.map(lambda x: (x,1)).reduceByKey(add).count())
print totalCount
print round(average, 2)
# TEST Mean using reduce (3b)
Test.assertEquals(round(average, 2), 1.67, 'incorrect value of average')
我自己也不确定,但是通过查看您的代码我可以看出一些问题。 'map' 函数不能与 'list_name.map(some stuff)' 这样的列表一起使用,你需要像这样调用 map 函数:'variable = map(function, arguments)',如果你使用 python 3,你需要做 'variable = list(map(function, arguments))'。
希望有所帮助:)
我找到了我的解决方案:
from operator import add
totalCount = (wordCounts
.map(lambda x: x[1])
.reduce(add))
average = totalCount / float(wordsRDD.map(lambda x: (x,1)).reduceByKey(add).count())
print totalCount
print round(average, 2)
另一种类似的方式:
您还可以将列表读取为键值对并使用 Distinct()
from operator import add
totalCount = (wordCounts
.map(lambda (k,v) : v )
.reduce(add))
average = totalCount / float(wordCounts.distinct().count())
print totalCount
print round(average, 2)
我正在 Spark 中做一个小实验,但遇到了麻烦。
wordCounts is : [('rat', 2), ('elephant', 1), ('cat', 2)]
# TODO: Replace <FILL IN> with appropriate code
from operator import add
totalCount = (wordCounts
.map(lambda x: (x,1)) <==== something wrong with this line maybe
.reduce(sum)) <====omething wrong with this line maybe
average = totalCount / float(wordsRDD.map(lambda x: (x,1)).reduceByKey(add).count())
print totalCount
print round(average, 2)
# TEST Mean using reduce (3b)
Test.assertEquals(round(average, 2), 1.67, 'incorrect value of average')
我自己也不确定,但是通过查看您的代码我可以看出一些问题。 'map' 函数不能与 'list_name.map(some stuff)' 这样的列表一起使用,你需要像这样调用 map 函数:'variable = map(function, arguments)',如果你使用 python 3,你需要做 'variable = list(map(function, arguments))'。 希望有所帮助:)
我找到了我的解决方案:
from operator import add
totalCount = (wordCounts
.map(lambda x: x[1])
.reduce(add))
average = totalCount / float(wordsRDD.map(lambda x: (x,1)).reduceByKey(add).count())
print totalCount
print round(average, 2)
另一种类似的方式: 您还可以将列表读取为键值对并使用 Distinct()
from operator import add
totalCount = (wordCounts
.map(lambda (k,v) : v )
.reduce(add))
average = totalCount / float(wordCounts.distinct().count())
print totalCount
print round(average, 2)