Spark AttributeError: 'SparkContext' object has no attribute 'map'

Question

我在 Hortonworks Data Platform 2.2 上使用 Spark，出现以下错误...有什么想法吗？

#!/usr/bin/env python

import sys
import pyspark
from pyspark import SparkContext

if 'sc' not in globals():
    #sc = SparkContext('local[4]', 'pyspark','map')
    sc = SparkContext(appName="PythonKMeans")

nums = sc.map([23,232,1,232,43,43])
squared = nums.map(lambda x: x*x).collect()
for num in squared:
    print num

Answer 1

而不是

nums = sc.map([23,232,1,232,43,43])

尝试

nums = sc.parallelize([23,232,1,232,43,43])

这告诉 Spark 在集群上分发序列并从中创建一个 RDD。然后，您可以在 RDD 上使用 lambda 函数调用 map，就像您在下一行中所做的那样。

Spark AttributeError: 'SparkContext' object has no attribute 'map'

Spark AttributeError: 'SparkContext' object has no attribute 'map'

python

linux

hadoop

hortonworks-data-platform

apache-spark