SnappyData:将独立 Spark 作业连接到嵌入式集群
SnappyData: Connect Standalone Spark Job to Embedded Cluster
我要实现的目标类似于智能连接器模式,但文档对我帮助不大,因为智能连接器示例基于 Spark-Shell,而我正在尝试运行 一个独立的 Scala 应用程序。因此,我不能将 --conf 参数用于 Spark-Shell.
为了找到我的 spark master,我查看了 SnappyData 网络界面。我发现了以下内容:
host-data="false"
locators="xxx.xxx.xxx.xxx:10334"
log-file="snappyleader.log"
mcast-port="0"
member-timeout="30000"
persist-dd="false"
route-query="false"
server-groups="IMPLICIT_LEADER_SERVERGROUP"
snappydata.embedded="true"
spark.app.name="SnappyData"
spark.closure.serializer="org.apache.spark.serializer.PooledKryoSerializer"
spark.driver.host="xxx.xxx.xxx.xxx"
spark.driver.port="37838"
spark.executor.id="driver"
spark.local.dir="/var/opt/snappydata/lead1/scratch"
spark.master="snappydata://xxx.xxx.xxx.xxx:10334"
spark.memory.manager="org.apache.spark.memory.SnappyUnifiedMemoryManager"
spark.memory.storageFraction="0.5"
spark.scheduler.mode="FAIR"
spark.serializer="org.apache.spark.serializer.PooledKryoSerializer"
spark.ui.port="5050"
statistic-archive-file="snappyleader.gfs"
--- end --
(目前 IP 地址都在一台主机上。)
我有一个简单的示例 Spark 作业,只是为了测试让我的集群正常工作:
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.SnappySession
import org.apache.spark.sql.Dataset
object snappytest{
case class Person(name: String, age: Long)
def main(args: Array[String]): Unit = {
val spark: SparkSession = SparkSession
.builder()
.appName("SnappyTest")
.master("snappydata://xxx.xxx.xxx.xxx:10334")
.getOrCreate()
val snappy = new SnappySession(spark.sparkContext)
import spark.implicits._
val caseClassDS = Seq(Person("Andy", 35)).toDS()
println(Person)
println(snappy)
println(spark)
}
}
我得到了这个错误:
17/10/25 14:44:57 INFO ServerConnector: Started Spark@ffaaaf0{HTTP/1.1}{0.0.0.0:4040}
17/10/25 14:44:57 INFO Server: Started @2743ms
17/10/25 14:44:57 INFO Utils: Successfully started service 'SparkUI' on port 4040.
17/10/25 14:44:57 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://xxx.xxx.xxx.xxx:4040
17/10/25 14:44:57 INFO SnappyEmbeddedModeClusterManager: setting from url snappydata.store.locators with xxx.xxx.xxx.xxx:10334
17/10/25 14:44:58 INFO LeadImpl: cluster configuration after overriding certain properties
jobserver.enabled=false
snappydata.embedded=true
snappydata.store.host-data=false
snappydata.store.locators=xxx.xxx.xxx.xxx:10334
snappydata.store.persist-dd=false
snappydata.store.server-groups=IMPLICIT_LEADER_SERVERGROUP
spark.app.name=SnappyTest
spark.driver.host=xxx.xxx.xxx.xxx
spark.driver.port=35602
spark.executor.id=driver
spark.master=snappydata://xxx.xxx.xxx.xxx:10334
17/10/25 14:44:58 INFO LeadImpl: passing store properties as {spark.driver.host=xxx.xxx.xxx.xxx, snappydata.embedded=true, spark.executor.id=driver, persist-dd=false, spark.app.name=SnappyTest, spark.driver.port=35602, spark.master=snappydata://xxx.xxx.xxx.xxx:10334, member-timeout=30000, host-data=false, default-startup-recovery-delay=120000, server-groups=IMPLICIT_LEADER_SERVERGROUP, locators=xxx.xxx.xxx.xxx:10334}
NanoTimer::Problem loading library from URL path: /home/jpride/.ivy2/cache/io.snappydata/gemfire-core/jars/libgemfirexd64.so: java.lang.UnsatisfiedLinkError: no gemfirexd64 in java.library.path
NanoTimer::Problem loading library from URL path: /home/jpride/.ivy2/cache/io.snappydata/gemfire-core/jars/libgemfirexd64.so: java.lang.UnsatisfiedLinkError: no gemfirexd64 in java.library.path
Exception in thread "main" org.apache.spark.SparkException: Primary Lead node (Spark Driver) is already running in the system. You may use smart connector mode to connect to SnappyData cluster.
那么在这种情况下我该如何(应该?)使用智能连接器模式?
您需要在示例 spark 作业中指定以下内容 -
.master("local[*]")
.config("snappydata.connection", "xxx.xxx.xxx.xxx:1527")
我要实现的目标类似于智能连接器模式,但文档对我帮助不大,因为智能连接器示例基于 Spark-Shell,而我正在尝试运行 一个独立的 Scala 应用程序。因此,我不能将 --conf 参数用于 Spark-Shell.
为了找到我的 spark master,我查看了 SnappyData 网络界面。我发现了以下内容:
host-data="false"
locators="xxx.xxx.xxx.xxx:10334"
log-file="snappyleader.log"
mcast-port="0"
member-timeout="30000"
persist-dd="false"
route-query="false"
server-groups="IMPLICIT_LEADER_SERVERGROUP"
snappydata.embedded="true"
spark.app.name="SnappyData"
spark.closure.serializer="org.apache.spark.serializer.PooledKryoSerializer"
spark.driver.host="xxx.xxx.xxx.xxx"
spark.driver.port="37838"
spark.executor.id="driver"
spark.local.dir="/var/opt/snappydata/lead1/scratch"
spark.master="snappydata://xxx.xxx.xxx.xxx:10334"
spark.memory.manager="org.apache.spark.memory.SnappyUnifiedMemoryManager"
spark.memory.storageFraction="0.5"
spark.scheduler.mode="FAIR"
spark.serializer="org.apache.spark.serializer.PooledKryoSerializer"
spark.ui.port="5050"
statistic-archive-file="snappyleader.gfs"
--- end --
(目前 IP 地址都在一台主机上。)
我有一个简单的示例 Spark 作业,只是为了测试让我的集群正常工作:
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.SnappySession
import org.apache.spark.sql.Dataset
object snappytest{
case class Person(name: String, age: Long)
def main(args: Array[String]): Unit = {
val spark: SparkSession = SparkSession
.builder()
.appName("SnappyTest")
.master("snappydata://xxx.xxx.xxx.xxx:10334")
.getOrCreate()
val snappy = new SnappySession(spark.sparkContext)
import spark.implicits._
val caseClassDS = Seq(Person("Andy", 35)).toDS()
println(Person)
println(snappy)
println(spark)
}
}
我得到了这个错误:
17/10/25 14:44:57 INFO ServerConnector: Started Spark@ffaaaf0{HTTP/1.1}{0.0.0.0:4040}
17/10/25 14:44:57 INFO Server: Started @2743ms
17/10/25 14:44:57 INFO Utils: Successfully started service 'SparkUI' on port 4040.
17/10/25 14:44:57 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://xxx.xxx.xxx.xxx:4040
17/10/25 14:44:57 INFO SnappyEmbeddedModeClusterManager: setting from url snappydata.store.locators with xxx.xxx.xxx.xxx:10334
17/10/25 14:44:58 INFO LeadImpl: cluster configuration after overriding certain properties
jobserver.enabled=false
snappydata.embedded=true
snappydata.store.host-data=false
snappydata.store.locators=xxx.xxx.xxx.xxx:10334
snappydata.store.persist-dd=false
snappydata.store.server-groups=IMPLICIT_LEADER_SERVERGROUP
spark.app.name=SnappyTest
spark.driver.host=xxx.xxx.xxx.xxx
spark.driver.port=35602
spark.executor.id=driver
spark.master=snappydata://xxx.xxx.xxx.xxx:10334
17/10/25 14:44:58 INFO LeadImpl: passing store properties as {spark.driver.host=xxx.xxx.xxx.xxx, snappydata.embedded=true, spark.executor.id=driver, persist-dd=false, spark.app.name=SnappyTest, spark.driver.port=35602, spark.master=snappydata://xxx.xxx.xxx.xxx:10334, member-timeout=30000, host-data=false, default-startup-recovery-delay=120000, server-groups=IMPLICIT_LEADER_SERVERGROUP, locators=xxx.xxx.xxx.xxx:10334}
NanoTimer::Problem loading library from URL path: /home/jpride/.ivy2/cache/io.snappydata/gemfire-core/jars/libgemfirexd64.so: java.lang.UnsatisfiedLinkError: no gemfirexd64 in java.library.path
NanoTimer::Problem loading library from URL path: /home/jpride/.ivy2/cache/io.snappydata/gemfire-core/jars/libgemfirexd64.so: java.lang.UnsatisfiedLinkError: no gemfirexd64 in java.library.path
Exception in thread "main" org.apache.spark.SparkException: Primary Lead node (Spark Driver) is already running in the system. You may use smart connector mode to connect to SnappyData cluster.
那么在这种情况下我该如何(应该?)使用智能连接器模式?
您需要在示例 spark 作业中指定以下内容 -
.master("local[*]")
.config("snappydata.connection", "xxx.xxx.xxx.xxx:1527")