SparkR 错误 RBackendHandler:fitRModelFormula
SparkR ERROR RBackendHandler: fitRModelFormula
我试图用 sparkR 做线性回归,从 this tutorial 开始。
我得到了 2 个数据框航空公司和飞机,每个航空公司和飞机都有一些字段。
#read dataframe
airlines <- read.df(sqlContext, path="/home/daniele/air.csv",source="com.databricks.spark.csv", header="true", inferSchema="true")
planes <- read.df(sqlContext, "/home/daniele/plane.csv",source="com.databricks.spark.csv", header="true", inferSchema="true")
#join both on tailnum field
joined<-join(airlines,planes,airlines$tailnum==planes$tailnum)
#it show some result as expected
showDF(select(training,"aircraft_type","DISTANCE","arr_delay","dep_delay"))
model <- glm(arr_delay ~ dep_delay + DISTANCE,family = "gaussian", data = joined)
在最后一条命令中,我得到了这个:
ERROR RBackendHandler: fitRModelFormula on [org.apache.spark.ml.api.r.SparkRWrappers failed
Errore in invokeJava(isStatic = TRUE, className, methodName, ...) :
java.lang.IllegalArgumentException: Could not parse formula: m$arr_delay ~ m$dep_delay
at org.apache.spark.ml.feature.RFormulaParser$.parse(RFormulaParser.scala:126)
at org.apache.spark.ml.feature.RFormula.hasIntercept(RFormula.scala:78)
at org.apache.spark.ml.api.r.SparkRWrappers$.fitRModelFormula(SparkRWrappers.scala:39)
at org.apache.spark.ml.api.r.SparkRWrappers.fitRModelFormula(SparkRWrappers.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.api.r.RBackendHandler.handleMethodCall(RBackendHandler.scala:132)
at org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:79)
at org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:38)
at io.netty.channel.SimpleChannelInb
我真的不知道如何修复它,当我遇到某种错误时,它们来自这个 RBackendHandler。
[已解决] 这是我尝试读取 csv 时产生的问题。我用这个 link 来解决:
R read.csv "More columns than column names" error
我试图用 sparkR 做线性回归,从 this tutorial 开始。
我得到了 2 个数据框航空公司和飞机,每个航空公司和飞机都有一些字段。
#read dataframe
airlines <- read.df(sqlContext, path="/home/daniele/air.csv",source="com.databricks.spark.csv", header="true", inferSchema="true")
planes <- read.df(sqlContext, "/home/daniele/plane.csv",source="com.databricks.spark.csv", header="true", inferSchema="true")
#join both on tailnum field
joined<-join(airlines,planes,airlines$tailnum==planes$tailnum)
#it show some result as expected
showDF(select(training,"aircraft_type","DISTANCE","arr_delay","dep_delay"))
model <- glm(arr_delay ~ dep_delay + DISTANCE,family = "gaussian", data = joined)
在最后一条命令中,我得到了这个:
ERROR RBackendHandler: fitRModelFormula on [org.apache.spark.ml.api.r.SparkRWrappers failed
Errore in invokeJava(isStatic = TRUE, className, methodName, ...) :
java.lang.IllegalArgumentException: Could not parse formula: m$arr_delay ~ m$dep_delay
at org.apache.spark.ml.feature.RFormulaParser$.parse(RFormulaParser.scala:126)
at org.apache.spark.ml.feature.RFormula.hasIntercept(RFormula.scala:78)
at org.apache.spark.ml.api.r.SparkRWrappers$.fitRModelFormula(SparkRWrappers.scala:39)
at org.apache.spark.ml.api.r.SparkRWrappers.fitRModelFormula(SparkRWrappers.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.api.r.RBackendHandler.handleMethodCall(RBackendHandler.scala:132)
at org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:79)
at org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:38)
at io.netty.channel.SimpleChannelInb
我真的不知道如何修复它,当我遇到某种错误时,它们来自这个 RBackendHandler。
[已解决] 这是我尝试读取 csv 时产生的问题。我用这个 link 来解决:
R read.csv "More columns than column names" error