无法在 Spark 中导入名称 LDA MLlib

Cannot import name LDA MLlib in Spark

我正在尝试使用 Spark 实现 LDA 并遇到此错误。我是 Spark 的新手,非常感谢您的帮助。

[root@sandbox ~]# spark-submit ./lda.py
Traceback (most recent call last):
  File "/root/./lda.py", line 3, in <module>
    from pyspark.mllib.clustering import LDA, LDAModel
ImportError: cannot import name LDA

代码如下:

from pyspark.sql import SQLContext
from pyspark import SparkContext
from pyspark.mllib.clustering import LDA, LDAModel
from pyspark.mllib.linalg import Vectors
import numpy
sc = SparkContext(appName="PythonLDA")
data = sc.textFile("/tutorial/input/askreddit20150801.txt")
parsedData = data.map(lambda line: Vectors.dense([float(x) for x in line.strip().split(' ')]))
# Index documents with unique IDs
corpus = parsedData.zipWithIndex().map(lambda x: [x[1], x[0]]).cache()

# Cluster the documents into three topics using LDA
ldaModel = LDA.train(corpus, k=3)

# Output topics. Each is a distribution over words (matching word count vectors)
print("Learned topics (as distributions over vocab of " + str(ldaModel.vocabSize()) + " words):")
topics = ldaModel.topicsMatrix()
for topic in range(3):
    print("Topic " + str(topic) + ":")
    for word in range(0, ldaModel.vocabSize()):
        print(" " + str(topics[word][topic]))

# Save and load model
model.save(sc, "myModelPath")
sameModel = LDAModel.load(sc, "myModelPath")

当我尝试安装时 pyspark.mllib.clustering:

[root@sandbox ~]# pip install spark.mllib.clustering
Collecting spark.mllib.clustering
/usr/lib/python2.6/site-packages/pip/_vendor/requests/packages/urllib3/util/ssl_.py:90: InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#insecureplatformwarning.
  InsecurePlatformWarning
  Could not find a version that satisfies the requirement spark.mllib.clustering (from versions: )
No matching distribution found for spark.mllib.clustering

用于 LDA 的 PySpark 包装器已在 Spark 1.5.0 中引入。假设您的安装没有损坏,您可能使用 Spark <= 1.4.x.