在一台机器上安装 Spark

Installing Spark on a single machine

我需要在 运行ning Ubuntu 14.04 的一台机器上安装 spark 我主要出于教育目的需要它,所以我对高性能不是很感兴趣。

我没有足够的知识来学习教程 http://spark.apache.org/docs/1.2.0/spark-standalone.html 并且我不知道应该安装哪个版本的 Spark。

有人可以向我解释如何在我的机器上逐步设置一个工作的 Spark 系统吗?

编辑: 根据评论和当前答案,我可以 运行 spark 控制台并使用它。

    donbeo@donbeo-HP-EliteBook-Folio-9470m:~/Applications/spark/spark-1.1.0$ ./bin/spark-shell
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
15/02/04 10:20:20 INFO SecurityManager: Changing view acls to: donbeo,
15/02/04 10:20:20 INFO SecurityManager: Changing modify acls to: donbeo,
15/02/04 10:20:20 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(donbeo, ); users with modify permissions: Set(donbeo, )
15/02/04 10:20:20 INFO HttpServer: Starting HTTP Server
15/02/04 10:20:20 INFO Utils: Successfully started service 'HTTP class server' on port 48135.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 1.1.0
      /_/

Using Scala version 2.10.4 (OpenJDK 64-Bit Server VM, Java 1.7.0_75)
Type in expressions to have them evaluated.
Type :help for more information.
15/02/04 10:20:23 WARN Utils: Your hostname, donbeo-HP-EliteBook-Folio-9470m resolves to a loopback address: 127.0.1.1; using 192.168.1.45 instead (on interface wlan0)
15/02/04 10:20:23 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
15/02/04 10:20:23 INFO SecurityManager: Changing view acls to: donbeo,
15/02/04 10:20:23 INFO SecurityManager: Changing modify acls to: donbeo,
15/02/04 10:20:23 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(donbeo, ); users with modify permissions: Set(donbeo, )
15/02/04 10:20:23 INFO Slf4jLogger: Slf4jLogger started
15/02/04 10:20:23 INFO Remoting: Starting remoting
15/02/04 10:20:23 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@192.168.1.45:34171]
15/02/04 10:20:23 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sparkDriver@192.168.1.45:34171]
15/02/04 10:20:23 INFO Utils: Successfully started service 'sparkDriver' on port 34171.
15/02/04 10:20:23 INFO SparkEnv: Registering MapOutputTracker
15/02/04 10:20:23 INFO SparkEnv: Registering BlockManagerMaster
15/02/04 10:20:24 INFO DiskBlockManager: Created local directory at /tmp/spark-local-20150204102024-1e7b
15/02/04 10:20:24 INFO Utils: Successfully started service 'Connection manager for block manager' on port 44926.
15/02/04 10:20:24 INFO ConnectionManager: Bound socket to port 44926 with id = ConnectionManagerId(192.168.1.45,44926)
15/02/04 10:20:24 INFO MemoryStore: MemoryStore started with capacity 265.4 MB
15/02/04 10:20:24 INFO BlockManagerMaster: Trying to register BlockManager
15/02/04 10:20:24 INFO BlockManagerMasterActor: Registering block manager 192.168.1.45:44926 with 265.4 MB RAM
15/02/04 10:20:24 INFO BlockManagerMaster: Registered BlockManager
15/02/04 10:20:24 INFO HttpFileServer: HTTP File server directory is /tmp/spark-58772693-4106-4ff0-a333-6512bcfff504
15/02/04 10:20:24 INFO HttpServer: Starting HTTP Server
15/02/04 10:20:24 INFO Utils: Successfully started service 'HTTP file server' on port 51677.
15/02/04 10:20:24 INFO Utils: Successfully started service 'SparkUI' on port 4040.
15/02/04 10:20:24 INFO SparkUI: Started SparkUI at http://192.168.1.45:4040
15/02/04 10:20:24 INFO Executor: Using REPL class URI: http://192.168.1.45:48135
15/02/04 10:20:24 INFO AkkaUtils: Connecting to HeartbeatReceiver: akka.tcp://sparkDriver@192.168.1.45:34171/user/HeartbeatReceiver
15/02/04 10:20:24 INFO SparkILoop: Created spark context..
Spark context available as sc.

scala> val x = 3
x: Int = 3

scala> 

现在假设我想在 scala 文件中使用 spark,例如

/* SimpleApp.scala */
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf

object SimpleApp {
  def main(args: Array[String]) {
    val logFile = "YOUR_SPARK_HOME/README.md" // Should be some file on your system
    val conf = new SparkConf().setAppName("Simple Application")
    val sc = new SparkContext(conf)
    val logData = sc.textFile(logFile, 2).cache()
    val numAs = logData.filter(line => line.contains("a")).count()
    val numBs = logData.filter(line => line.contains("b")).count()
    println("Lines with a: %s, Lines with b: %s".format(numAs, numBs))
  }
}

我该怎么做?

如果你只是打算运行它在一台机器上用于学习等,那么你可以使用local(1核)或local[*](所有"master" 的核心)值。然后它 运行 就像普通的 JVM 进程一样,即使在 IDE、调试器等中也是如此。我写了一个以这种方式工作的自己动手的研讨会,https://github.com/deanwampler/spark-workshop,如果你需要一个例子。

如果 local 足够,下载一个二进制文件即可满足您的需求。