为什么我不能在 Spark KMeans 算法上设置 epsilon=1e-4?
why can't I set epsilon=1e-4 on Spark KMeans algorithm?
我想通过设置 epsilon=1e-4
而不是设置 numIterations
在 Spark 上训练 K-means 模型。在 spark shell 中,我输入:
val model = KMeans.train(trainRDD, numClusters=8, runs=30, initializationMode="k-means||",epsilon=1e-4)
但是报错,报错信息如下:
scala> val model = KMeans.train(trainRDD, numClusters=8, runs=30, initializationMode="k-means||",epsilon=1e-4)
<console>:48: error: overloaded method value train with alternatives:
(data: org.apache.spark.rdd.RDD[org.apache.spark.mllib.linalg.Vector],k: Int,maxIterations: Int,runs: Int)org.apache.spark.mllib.clustering.KMeansModel <and>
(data: org.apache.spark.rdd.RDD[org.apache.spark.mllib.linalg.Vector],k: Int,maxIterations: Int)org.apache.spark.mllib.clustering.KMeansModel <and>
(data: org.apache.spark.rdd.RDD[org.apache.spark.mllib.linalg.Vector],k: Int,maxIterations: Int,runs: Int,initializationMode: String)org.apache.spark.mllib.clustering.KMeansModel <and>
(data: org.apache.spark.rdd.RDD[org.apache.spark.mllib.linalg.Vector],k: Int,maxIterations: Int,runs: Int,initializationMode: String,seed: Long)org.apache.spark.mllib.clustering.KMeansModel
cannot be applied to (org.apache.spark.rdd.RDD[org.apache.spark.mllib.linalg.Vector], numClusters: Int, runs: Int, initializationMode: String, epsilon: Double)
val model = KMeans.train(trainRDD, numClusters=8, runs=30, initializationMode="k-means||",epsilon=1e-4)
^
我该怎么办?
没有定义这样的 train
方法。
使用真正的构造函数,并根据需要设置参数。
查看文档:http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.mllib.clustering.KMeans
然后使用setEpsilon
设置提前终止阈值。
我想通过设置 epsilon=1e-4
而不是设置 numIterations
在 Spark 上训练 K-means 模型。在 spark shell 中,我输入:
val model = KMeans.train(trainRDD, numClusters=8, runs=30, initializationMode="k-means||",epsilon=1e-4)
但是报错,报错信息如下:
scala> val model = KMeans.train(trainRDD, numClusters=8, runs=30, initializationMode="k-means||",epsilon=1e-4)
<console>:48: error: overloaded method value train with alternatives:
(data: org.apache.spark.rdd.RDD[org.apache.spark.mllib.linalg.Vector],k: Int,maxIterations: Int,runs: Int)org.apache.spark.mllib.clustering.KMeansModel <and>
(data: org.apache.spark.rdd.RDD[org.apache.spark.mllib.linalg.Vector],k: Int,maxIterations: Int)org.apache.spark.mllib.clustering.KMeansModel <and>
(data: org.apache.spark.rdd.RDD[org.apache.spark.mllib.linalg.Vector],k: Int,maxIterations: Int,runs: Int,initializationMode: String)org.apache.spark.mllib.clustering.KMeansModel <and>
(data: org.apache.spark.rdd.RDD[org.apache.spark.mllib.linalg.Vector],k: Int,maxIterations: Int,runs: Int,initializationMode: String,seed: Long)org.apache.spark.mllib.clustering.KMeansModel
cannot be applied to (org.apache.spark.rdd.RDD[org.apache.spark.mllib.linalg.Vector], numClusters: Int, runs: Int, initializationMode: String, epsilon: Double)
val model = KMeans.train(trainRDD, numClusters=8, runs=30, initializationMode="k-means||",epsilon=1e-4)
^
我该怎么办?
没有定义这样的 train
方法。
使用真正的构造函数,并根据需要设置参数。
查看文档:http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.mllib.clustering.KMeans
然后使用setEpsilon
设置提前终止阈值。