Spark 机器学习库
Spark ML Library
我正在测试我在 MLlib:主要指南中找到的这段 Scala 代码
Machine Learning Library (MLlib) Guide
import org.apache.spark.ml.linalg.{Matrix, Vectors, Vector}
import org.apache.spark.ml.stat.Correlation
import org.apache.spark.sql.Row
import scala.collection.Seq
object BasicStatistics {
def main(args: Array[String]): Unit = {
val data: Seq[Vector] = Seq(
Vectors.sparse(4, Seq((0, 1.0), (3, -2.0))),
Vectors.dense(4.0, 5.0, 0.0, 3.0),
Vectors.dense(6.0, 7.0, 0.0, 8.0),
Vectors.sparse(4, Seq((0, 9.0), (3, 1.0))))
val df = data.map(Tuple1.apply).toDF("features")
val Row(coeff1: Matrix) = Correlation.corr(df, "features").head
println(s"Pearson correlation matrix:\n $coeff1")
val Row(coeff2: Matrix) = Correlation.corr(df, "features", "spearman").head
println(s"Spearman correlation matrix:\n $coeff2")
}
}
但是这一行报错
val df = data.map(Tuple1.apply).toDF("features")
它说,
"value toDF is not a member of Seq[(org.apache.spark.ml.linalg.Vector,)]"
值数据(Seq[Vector])好像没有map方法?
关于如何进行的任何想法?
以下来自我的 pom.xml
<dependencies>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-mllib -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-mllib_2.11</artifactId>
<version>2.3.0</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.3.0</version>
</dependency>
</dependencies>
此时,您还没有 SparkSession
或任何启动项。我相信 toDF
来自导入 spark.implicits._
,其中 spark
是 SparkSession
。文档有时并没有说清楚 and/or 假定您在 Spark shell 中工作,它会自动创建会话。
您的代码在 spark shell 中执行 运行。
这是因为缺少 scala.Seq
的隐式转换。
要解决您的问题,请添加论文行
val name = "application name"
val spark = SparkSession
.builder
.appName(name)
.master("local")
.getOrCreate()
import spark.implicits._
希望对您有所帮助!
我正在测试我在 MLlib:主要指南中找到的这段 Scala 代码 Machine Learning Library (MLlib) Guide
import org.apache.spark.ml.linalg.{Matrix, Vectors, Vector}
import org.apache.spark.ml.stat.Correlation
import org.apache.spark.sql.Row
import scala.collection.Seq
object BasicStatistics {
def main(args: Array[String]): Unit = {
val data: Seq[Vector] = Seq(
Vectors.sparse(4, Seq((0, 1.0), (3, -2.0))),
Vectors.dense(4.0, 5.0, 0.0, 3.0),
Vectors.dense(6.0, 7.0, 0.0, 8.0),
Vectors.sparse(4, Seq((0, 9.0), (3, 1.0))))
val df = data.map(Tuple1.apply).toDF("features")
val Row(coeff1: Matrix) = Correlation.corr(df, "features").head
println(s"Pearson correlation matrix:\n $coeff1")
val Row(coeff2: Matrix) = Correlation.corr(df, "features", "spearman").head
println(s"Spearman correlation matrix:\n $coeff2")
}
}
但是这一行报错
val df = data.map(Tuple1.apply).toDF("features")
它说, "value toDF is not a member of Seq[(org.apache.spark.ml.linalg.Vector,)]"
值数据(Seq[Vector])好像没有map方法?
关于如何进行的任何想法?
以下来自我的 pom.xml
<dependencies>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-mllib -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-mllib_2.11</artifactId>
<version>2.3.0</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.3.0</version>
</dependency>
</dependencies>
此时,您还没有 SparkSession
或任何启动项。我相信 toDF
来自导入 spark.implicits._
,其中 spark
是 SparkSession
。文档有时并没有说清楚 and/or 假定您在 Spark shell 中工作,它会自动创建会话。
您的代码在 spark shell 中执行 运行。
这是因为缺少 scala.Seq
的隐式转换。
要解决您的问题,请添加论文行
val name = "application name"
val spark = SparkSession
.builder
.appName(name)
.master("local")
.getOrCreate()
import spark.implicits._
希望对您有所帮助!