Sparkr java 错误
Sparkr java error
当我尝试在 R 中加载数据时:
df <- read.df(sqlContext, "https://s3-us-west-2.amazonaws.com/sparkr-data/nycflights13.csv", "com.databricks.spark.csv",header=T)
我收到 java
的错误
Error in invokeJava(isStatic = TRUE, className, methodName, ...) :
java.lang.ClassCastException: java.lang.Boolean cannot be cast to java.lang.String
at com.databricks.spark.csv.DefaultSource.createRelation(DefaultSource.scala:74)
at com.databricks.spark.csv.DefaultSource.createRelation(DefaultSource.scala:39)
at com.databricks.spark.csv.DefaultSource.createRelation(DefaultSource.scala:27)
at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:125)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:114)
at org.apache.spark.sql.api.r.SQLUtils$.loadDF(SQLUtils.scala:156)
at org.apache.spark.sql.api.r.SQLUtils.loadDF(SQLUtils.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.apache.spark.api.r.RBackendHandler.handleMethodCall(RBackendHandler.scala:132)
at or
我终于找到了解决上述问题的方法。
需要确保以下
你已经安装了java开发包,你可以从网站下载
下载 this 并保存到 C:/hadoop
在这个 bin 文件夹中应该像 C:/hadoop/bin
在环境变量中设置JAVA_HOME(这里不提bin文件夹)
设置 HADOOP_HOME 为环境变量(这里不要提到 bin 文件夹)
现在运行关注
rm(list=ls())
# Set the system environment variables
Sys.setenv(SPARK_HOME = "C:/spark")
Sys.setenv(HADOOP_HOME = "C:/Hadoop")
.libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))
#load the Sparkr library
library(rJava)
library(SparkR)
Sys.setenv('SPARKR_SUBMIT_ARGS'='"--packages" "com.databricks:spark-csv_2.11:1.2.0" "sparkr-shell"')
Sys.setenv(SPARK_MEM="1g")
# Create a spark context and a SQL context
sc <- sparkR.init(master = "local")
sqlContext <- sparkRSQL.init(sc)
现在您应该可以读取 CSV 文件了
经过多次尝试,我找到了 read.df()
中的问题所在。 header 属性 会产生问题。 header 应该是 header="true"
或 header="false"
。
> people = read.df(sqlContext, "C:\Users\Vivek\Desktop\AirPassengers.csv", source = "com.databricks.spark.csv",header=TRUE)
Error in invokeJava(isStatic = TRUE, className, methodName, ...) :
java.lang.ClassCastException: java.lang.Boolean cannot be cast to java.lang.String
at com.databricks.spark.csv.DefaultSource.createRelation(DefaultSource.scala:81)
at com.databricks.spark.csv.DefaultSource.createRelation(DefaultSource.scala:40)
at com.databricks.spark.csv.DefaultSource.createRelation(DefaultSource.scala:28)
at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:125)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:114)
at org.apache.spark.sql.api.r.SQLUtils$.loadDF(SQLUtils.scala:156)
at org.apache.spark.sql.api.r.SQLUtils.loadDF(SQLUtils.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.apache.spark.api.r.RBackendHandler.handleMethodCall(RBackendHandler.scala:132)
at or
> people = read.df(sqlContext, "C:\Users\Vivek\Desktop\AirPassengers.csv", source = "com.databricks.spark.csv",header="true")
> head(people)
Sl_No time AirPassengers
1 1 1949 112
2 2 1949.083333 118
3 3 1949.166667 132
4 4 1949.25 129
5 5 1949.333333 121
6 6 1949.416667 135
>
当我尝试在 R 中加载数据时:
df <- read.df(sqlContext, "https://s3-us-west-2.amazonaws.com/sparkr-data/nycflights13.csv", "com.databricks.spark.csv",header=T)
我收到 java
的错误Error in invokeJava(isStatic = TRUE, className, methodName, ...) :
java.lang.ClassCastException: java.lang.Boolean cannot be cast to java.lang.String
at com.databricks.spark.csv.DefaultSource.createRelation(DefaultSource.scala:74)
at com.databricks.spark.csv.DefaultSource.createRelation(DefaultSource.scala:39)
at com.databricks.spark.csv.DefaultSource.createRelation(DefaultSource.scala:27)
at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:125)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:114)
at org.apache.spark.sql.api.r.SQLUtils$.loadDF(SQLUtils.scala:156)
at org.apache.spark.sql.api.r.SQLUtils.loadDF(SQLUtils.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.apache.spark.api.r.RBackendHandler.handleMethodCall(RBackendHandler.scala:132)
at or
我终于找到了解决上述问题的方法。 需要确保以下
你已经安装了java开发包,你可以从网站下载 下载 this 并保存到 C:/hadoop 在这个 bin 文件夹中应该像 C:/hadoop/bin
在环境变量中设置JAVA_HOME(这里不提bin文件夹) 设置 HADOOP_HOME 为环境变量(这里不要提到 bin 文件夹)
现在运行关注
rm(list=ls())
# Set the system environment variables
Sys.setenv(SPARK_HOME = "C:/spark")
Sys.setenv(HADOOP_HOME = "C:/Hadoop")
.libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))
#load the Sparkr library
library(rJava)
library(SparkR)
Sys.setenv('SPARKR_SUBMIT_ARGS'='"--packages" "com.databricks:spark-csv_2.11:1.2.0" "sparkr-shell"')
Sys.setenv(SPARK_MEM="1g")
# Create a spark context and a SQL context
sc <- sparkR.init(master = "local")
sqlContext <- sparkRSQL.init(sc)
现在您应该可以读取 CSV 文件了
经过多次尝试,我找到了 read.df()
中的问题所在。 header 属性 会产生问题。 header 应该是 header="true"
或 header="false"
。
> people = read.df(sqlContext, "C:\Users\Vivek\Desktop\AirPassengers.csv", source = "com.databricks.spark.csv",header=TRUE)
Error in invokeJava(isStatic = TRUE, className, methodName, ...) :
java.lang.ClassCastException: java.lang.Boolean cannot be cast to java.lang.String
at com.databricks.spark.csv.DefaultSource.createRelation(DefaultSource.scala:81)
at com.databricks.spark.csv.DefaultSource.createRelation(DefaultSource.scala:40)
at com.databricks.spark.csv.DefaultSource.createRelation(DefaultSource.scala:28)
at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:125)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:114)
at org.apache.spark.sql.api.r.SQLUtils$.loadDF(SQLUtils.scala:156)
at org.apache.spark.sql.api.r.SQLUtils.loadDF(SQLUtils.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.apache.spark.api.r.RBackendHandler.handleMethodCall(RBackendHandler.scala:132)
at or
> people = read.df(sqlContext, "C:\Users\Vivek\Desktop\AirPassengers.csv", source = "com.databricks.spark.csv",header="true")
> head(people)
Sl_No time AirPassengers
1 1 1949 112
2 2 1949.083333 118
3 3 1949.166667 132
4 4 1949.25 129
5 5 1949.333333 121
6 6 1949.416667 135
>