SparkR 错误 RBackendHandler:fitRModelFormula

SparkR ERROR RBackendHandler: fitRModelFormula

我试图用 sparkR 做线性回归,从 this tutorial 开始。

我得到了 2 个数据框航空公司和飞机,每个航空公司和飞机都有一些字段。

#read dataframe 
airlines <- read.df(sqlContext, path="/home/daniele/air.csv",source="com.databricks.spark.csv", header="true", inferSchema="true")

planes <- read.df(sqlContext, "/home/daniele/plane.csv",source="com.databricks.spark.csv", header="true", inferSchema="true")

#join both on tailnum field
joined<-join(airlines,planes,airlines$tailnum==planes$tailnum)

#it show some result as expected 
showDF(select(training,"aircraft_type","DISTANCE","arr_delay","dep_delay"))

model <- glm(arr_delay ~ dep_delay + DISTANCE,family = "gaussian", data = joined)

在最后一条命令中,我得到了这个:

ERROR RBackendHandler: fitRModelFormula on [org.apache.spark.ml.api.r.SparkRWrappers failed
Errore in invokeJava(isStatic = TRUE, className, methodName, ...) : 
  java.lang.IllegalArgumentException: Could not parse formula: m$arr_delay ~ m$dep_delay
    at org.apache.spark.ml.feature.RFormulaParser$.parse(RFormulaParser.scala:126)
    at org.apache.spark.ml.feature.RFormula.hasIntercept(RFormula.scala:78)
    at org.apache.spark.ml.api.r.SparkRWrappers$.fitRModelFormula(SparkRWrappers.scala:39)
    at org.apache.spark.ml.api.r.SparkRWrappers.fitRModelFormula(SparkRWrappers.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.spark.api.r.RBackendHandler.handleMethodCall(RBackendHandler.scala:132)
    at org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:79)
    at org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:38)
    at io.netty.channel.SimpleChannelInb

我真的不知道如何修复它,当我遇到某种错误时,它们来自这个 RBackendHandler。

[已解决] 这是我尝试读取 csv 时产生的问题。我用这个 link 来解决:

R read.csv "More columns than column names" error