Spark-1.5.0 - 在 RStudio 中加载 com.databricks:-csv_2.11:1.2.0
Spark-1.5.0 - Loading com.databricks:-csv_2.11:1.2.0 in RStudio
在我的 Mac 机器上安装了 Spark-1.5.0
,我正在尝试使用 rStudio 中的 com.databricks:-csv_2.11:1.2.0
包初始化 spark 上下文,如:
Sys.setenv('SPARKR_SUBMIT_ARGS'='"--packages" "com.databricks:-csv_2.11:1.2.0" "sparkr-shell"')
library(SparkR, lib.loc = "spark-1.5.0-bin-hadoop2.6/R/lib/")
sc <- sparkR.init(sparkHome = "spark-1.5.0-bin-hadoop2.6/")
但我收到以下错误消息:
[unresolved dependency: com.springml#spark-salesforce_2.10;1.0.1: not found]
为什么会这样?
P.s.,当我使用 com.databricks:spark-csv_2.10:1.0.3
时,初始化工作正常。
UPDATE
我尝试使用 com.databricks:spark-csv_2.10:1.2.0 版本,一切正常。
现在,我在 rStudio 中使用这段代码加载一个 csv 文件:
sqlContext <- sparkRSQL.init(sc)
flights <- read.df(sqlContext, "R/nycflights13.csv", "com.databricks.spark.csv", header="true")
我收到以下错误消息:
Error in writeJobj(con, object) : invalid jobj 1
当我执行 sqlContext
时出现错误:
Error in callJMethod(x, "getClass") :
Invalid jobj 1. If SparkR was restarted, Spark operations need to be re-executed.
Session 信息:
R version 3.2.0 (2015-04-16)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.10.2 (Yosemite)
locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] SparkR_1.5.0 rJava_0.9-7
loaded via a namespace (and not attached):
[1] tools_3.2.0
请注意,当我使用带有相同命令的 Spark Shell 时,我不会收到此错误。
问题已解决。
重新启动会话并使用以下代码后一切正常:
Sys.setenv('SPARKR_SUBMIT_ARGS'='"--packages" "com.databricks:spark-csv_2.10:1.2.0" "sparkr-shell"')
library(rJava)
library(SparkR, lib.loc = "spark-1.5.0-bin-hadoop2.6/R/lib/")
sc <- sparkR.init(master = "local", sparkHome = "spark-1.5.0-bin-hadoop2.6")
sqlContext <- sparkRSQL.init(sc)
flights <- read.df(sqlContext, "R/nycflights13.csv", "com.databricks.spark.csv", header="true")
在我的 Mac 机器上安装了 Spark-1.5.0
,我正在尝试使用 rStudio 中的 com.databricks:-csv_2.11:1.2.0
包初始化 spark 上下文,如:
Sys.setenv('SPARKR_SUBMIT_ARGS'='"--packages" "com.databricks:-csv_2.11:1.2.0" "sparkr-shell"')
library(SparkR, lib.loc = "spark-1.5.0-bin-hadoop2.6/R/lib/")
sc <- sparkR.init(sparkHome = "spark-1.5.0-bin-hadoop2.6/")
但我收到以下错误消息:
[unresolved dependency: com.springml#spark-salesforce_2.10;1.0.1: not found]
为什么会这样?
P.s.,当我使用 com.databricks:spark-csv_2.10:1.0.3
时,初始化工作正常。
UPDATE
我尝试使用 com.databricks:spark-csv_2.10:1.2.0 版本,一切正常。
现在,我在 rStudio 中使用这段代码加载一个 csv 文件:
sqlContext <- sparkRSQL.init(sc)
flights <- read.df(sqlContext, "R/nycflights13.csv", "com.databricks.spark.csv", header="true")
我收到以下错误消息:
Error in writeJobj(con, object) : invalid jobj 1
当我执行 sqlContext
时出现错误:
Error in callJMethod(x, "getClass") :
Invalid jobj 1. If SparkR was restarted, Spark operations need to be re-executed.
Session 信息:
R version 3.2.0 (2015-04-16)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.10.2 (Yosemite)
locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] SparkR_1.5.0 rJava_0.9-7
loaded via a namespace (and not attached):
[1] tools_3.2.0
请注意,当我使用带有相同命令的 Spark Shell 时,我不会收到此错误。
问题已解决。
重新启动会话并使用以下代码后一切正常:
Sys.setenv('SPARKR_SUBMIT_ARGS'='"--packages" "com.databricks:spark-csv_2.10:1.2.0" "sparkr-shell"')
library(rJava)
library(SparkR, lib.loc = "spark-1.5.0-bin-hadoop2.6/R/lib/")
sc <- sparkR.init(master = "local", sparkHome = "spark-1.5.0-bin-hadoop2.6")
sqlContext <- sparkRSQL.init(sc)
flights <- read.df(sqlContext, "R/nycflights13.csv", "com.databricks.spark.csv", header="true")