Statistics.corr 在 IntelliJ IDEA 中出现以下错误:无法解析重载方法 'corr'
Statistics.corr gives following error in IntelliJ IDEA: Cannot resolve overloaded method 'corr'
我正在尝试关注这个项目https://github.com/caroljmcdonald/spark-stock-sql/blob/master/src/main/scala/example/Stock.scala
并且在我的 IDE 中它给了我错误:无法解析重载方法 'corr'
在代码计算从 parquet 文件读取的 2 列之间的相关性的部分
val df = sqlContext.read.parquet("joinstock.parquet")
df.show
df.printSchema
df.explain()
// COMMAND ----------
//var agg_df = df.groupBy("location").agg(min("id"), count("id"), avg("date_diff"))
df.select(year($"dt").alias("yr"), month($"dt").alias("mo"), $"apcclose", $"xomclose", $"spyclose").groupBy("yr", "mo").agg(avg("apcclose"), avg("xomclose"), avg("spyclose")).orderBy(desc("yr"), desc("mo")).show
// COMMAND ----------
df.select(year($"dt").alias("yr"), month($"dt").alias("mo"), $"apcclose", $"xomclose", $"spyclose").groupBy("yr", "mo").agg(avg("apcclose"), avg("xomclose"), avg("spyclose")).orderBy(desc("yr"), desc("mo")).explain
这些行给我的 IntelliJ 错误 IDE 无法解析重载方法 'corr'
// COMMAND ----------
var seriesX = df.select($"xomclose").map { row: Row => row.getAs[Double]("xomclose") } //.rdd
var seriesY = df.select($"spyclose").map { row: Row => row.getAs[Double]("spyclose") } //.rdd
var correlation = Statistics.corr(seriesX, seriesY, "pearson")
// COMMAND ----------
seriesX = df.select($"apcclose").map { row: Row => row.getAs[Double]("apcclose") } //.rdd
seriesY = df.select($"xomclose").map { row: Row => row.getAs[Double]("xomclose") } //.rdd
correlation = Statistics.corr(seriesX, seriesY, "pearson")
}
}
你可以试试dataframe的相关方法:
var correlation = df.stat.corr("xomclose", "spyclose", "pearson")
我正在尝试关注这个项目https://github.com/caroljmcdonald/spark-stock-sql/blob/master/src/main/scala/example/Stock.scala
并且在我的 IDE 中它给了我错误:无法解析重载方法 'corr' 在代码计算从 parquet 文件读取的 2 列之间的相关性的部分
val df = sqlContext.read.parquet("joinstock.parquet")
df.show
df.printSchema
df.explain()
// COMMAND ----------
//var agg_df = df.groupBy("location").agg(min("id"), count("id"), avg("date_diff"))
df.select(year($"dt").alias("yr"), month($"dt").alias("mo"), $"apcclose", $"xomclose", $"spyclose").groupBy("yr", "mo").agg(avg("apcclose"), avg("xomclose"), avg("spyclose")).orderBy(desc("yr"), desc("mo")).show
// COMMAND ----------
df.select(year($"dt").alias("yr"), month($"dt").alias("mo"), $"apcclose", $"xomclose", $"spyclose").groupBy("yr", "mo").agg(avg("apcclose"), avg("xomclose"), avg("spyclose")).orderBy(desc("yr"), desc("mo")).explain
这些行给我的 IntelliJ 错误 IDE 无法解析重载方法 'corr'
// COMMAND ----------
var seriesX = df.select($"xomclose").map { row: Row => row.getAs[Double]("xomclose") } //.rdd
var seriesY = df.select($"spyclose").map { row: Row => row.getAs[Double]("spyclose") } //.rdd
var correlation = Statistics.corr(seriesX, seriesY, "pearson")
// COMMAND ----------
seriesX = df.select($"apcclose").map { row: Row => row.getAs[Double]("apcclose") } //.rdd
seriesY = df.select($"xomclose").map { row: Row => row.getAs[Double]("xomclose") } //.rdd
correlation = Statistics.corr(seriesX, seriesY, "pearson")
}
}
你可以试试dataframe的相关方法:
var correlation = df.stat.corr("xomclose", "spyclose", "pearson")