Spark 标度 运行
Spark scala running
嗨,我是 spark 和 scala 的新手。我在 spark scala 提示符下 运行ning scala 代码。该程序很好,它显示 "defined module MLlib" 但它没有在屏幕上打印任何内容。我做错了什么?有没有其他方法可以 运行 这个程序在 scala shell 中启动并获得输出?
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.mllib.classification.LogisticRegressionWithSGD
import org.apache.spark.mllib.feature.HashingTF
import org.apache.spark.mllib.regression.LabeledPoint
object MLlib {
def main(args: Array[String]) {
val conf = new SparkConf().setAppName(s"Book example: Scala")
val sc = new SparkContext(conf)
// Load 2 types of emails from text files: spam and ham (non-spam).
// Each line has text from one email.
val spam = sc.textFile("/home/training/Spam.txt")
val ham = sc.textFile("/home/training/Ham.txt")
// Create a HashingTF instance to map email text to vectors of 100 features.
val tf = new HashingTF(numFeatures = 100)
// Each email is split into words, and each word is mapped to one feature.
val spamFeatures = spam.map(email => tf.transform(email.split(" ")))
val hamFeatures = ham.map(email => tf.transform(email.split(" ")))
// Create LabeledPoint datasets for positive (spam) and negative (ham) examples.
val positiveExamples = spamFeatures.map(features => LabeledPoint(1, features))
val negativeExamples = hamFeatures.map(features => LabeledPoint(0, features))
val trainingData = positiveExamples ++ negativeExamples
trainingData.cache() // Cache data since Logistic Regression is an iterative algorithm.
// Create a Logistic Regression learner which uses the LBFGS optimizer.
val lrLearner = new LogisticRegressionWithSGD()
// Run the actual learning algorithm on the training data.
val model = lrLearner.run(trainingData)
// Test on a positive example (spam) and a negative one (ham).
// First apply the same HashingTF feature transformation used on the training data.
val posTestExample = tf.transform("O M G GET cheap stuff by sending money to ...".split(" "))
val negTestExample = tf.transform("Hi Dad, I started studying Spark the other ...".split(" "))
// Now use the learned model to predict spam/ham for new emails.
println(s"Prediction for positive test example: ${model.predict(posTestExample)}")
println(s"Prediction for negative test example: ${model.predict(negTestExample)}")
sc.stop()
}
}
几件事:
您在 Spark shell 中定义了 object
,因此不会立即调用 main
class。您必须在定义 object
:
后显式调用它
MLlib.main(Array())
事实上,如果您继续在 shell/REPL 上工作,您可以完全取消该对象;你可以直接定义函数。例如:
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.mllib.classification.LogisticRegressionWithSGD
import org.apache.spark.mllib.feature.HashingTF
import org.apache.spark.mllib.regression.LabeledPoint
def MLlib {
//the rest of your code
}
但是,您不应该在 shell 中初始化 SparkContext
它。来自 documentation:
In the Spark shell, a special interpreter-aware SparkContext is
already created for you, in the variable called sc. Making your own
SparkContext will not work
因此,您必须从代码中删除该位,或者将其编译到 jar 中并 运行 使用 spark-submit
嗨,我是 spark 和 scala 的新手。我在 spark scala 提示符下 运行ning scala 代码。该程序很好,它显示 "defined module MLlib" 但它没有在屏幕上打印任何内容。我做错了什么?有没有其他方法可以 运行 这个程序在 scala shell 中启动并获得输出?
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.mllib.classification.LogisticRegressionWithSGD
import org.apache.spark.mllib.feature.HashingTF
import org.apache.spark.mllib.regression.LabeledPoint
object MLlib {
def main(args: Array[String]) {
val conf = new SparkConf().setAppName(s"Book example: Scala")
val sc = new SparkContext(conf)
// Load 2 types of emails from text files: spam and ham (non-spam).
// Each line has text from one email.
val spam = sc.textFile("/home/training/Spam.txt")
val ham = sc.textFile("/home/training/Ham.txt")
// Create a HashingTF instance to map email text to vectors of 100 features.
val tf = new HashingTF(numFeatures = 100)
// Each email is split into words, and each word is mapped to one feature.
val spamFeatures = spam.map(email => tf.transform(email.split(" ")))
val hamFeatures = ham.map(email => tf.transform(email.split(" ")))
// Create LabeledPoint datasets for positive (spam) and negative (ham) examples.
val positiveExamples = spamFeatures.map(features => LabeledPoint(1, features))
val negativeExamples = hamFeatures.map(features => LabeledPoint(0, features))
val trainingData = positiveExamples ++ negativeExamples
trainingData.cache() // Cache data since Logistic Regression is an iterative algorithm.
// Create a Logistic Regression learner which uses the LBFGS optimizer.
val lrLearner = new LogisticRegressionWithSGD()
// Run the actual learning algorithm on the training data.
val model = lrLearner.run(trainingData)
// Test on a positive example (spam) and a negative one (ham).
// First apply the same HashingTF feature transformation used on the training data.
val posTestExample = tf.transform("O M G GET cheap stuff by sending money to ...".split(" "))
val negTestExample = tf.transform("Hi Dad, I started studying Spark the other ...".split(" "))
// Now use the learned model to predict spam/ham for new emails.
println(s"Prediction for positive test example: ${model.predict(posTestExample)}")
println(s"Prediction for negative test example: ${model.predict(negTestExample)}")
sc.stop()
}
}
几件事:
您在 Spark shell 中定义了 object
,因此不会立即调用 main
class。您必须在定义 object
:
MLlib.main(Array())
事实上,如果您继续在 shell/REPL 上工作,您可以完全取消该对象;你可以直接定义函数。例如:
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.mllib.classification.LogisticRegressionWithSGD
import org.apache.spark.mllib.feature.HashingTF
import org.apache.spark.mllib.regression.LabeledPoint
def MLlib {
//the rest of your code
}
但是,您不应该在 shell 中初始化 SparkContext
它。来自 documentation:
In the Spark shell, a special interpreter-aware SparkContext is already created for you, in the variable called sc. Making your own SparkContext will not work
因此,您必须从代码中删除该位,或者将其编译到 jar 中并 运行 使用 spark-submit