火花 2.1 中的欧氏距离
Euclidean distance in spark 2.1
我正在尝试计算两个向量的欧氏距离。我有以下数据框:
root
|-- h: string (nullable = true)
|-- id: string (nullable = true)
|-- sid: string (nullable = true)
|-- features: vector (nullable = true)
|-- episodeFeatures: vector (nullable = true)
import org.apache.spark.mllib.util.{MLUtils}
val jP2 = jP.withColumn("dist", MLUtils.fastSquaredDistance("features", 5, "episodeFeatures", 5))
我收到这样的错误:
error: method fastSquaredDistance in object MLUtils cannot be accessed in object org.apache.spark.mllib.util.MLUtils
有没有办法访问那个私有方法?
MLUtils
是内部包,即使不是那个,它也不能用于 Columns
或(从版本猜测)ml
向量。你必须自己设计 udf
:
import org.apache.spark.sql.functions._
import org.apache.spark.ml.linalg.Vector
val euclidean = udf((v1: Vector, v2: Vector) => ???) // Fill with preferred logic
val jP2 = jP.withColumn("dist", euclidean($"features", $"episodeFeatures"))
我正在尝试计算两个向量的欧氏距离。我有以下数据框:
root
|-- h: string (nullable = true)
|-- id: string (nullable = true)
|-- sid: string (nullable = true)
|-- features: vector (nullable = true)
|-- episodeFeatures: vector (nullable = true)
import org.apache.spark.mllib.util.{MLUtils}
val jP2 = jP.withColumn("dist", MLUtils.fastSquaredDistance("features", 5, "episodeFeatures", 5))
我收到这样的错误:
error: method fastSquaredDistance in object MLUtils cannot be accessed in object org.apache.spark.mllib.util.MLUtils
有没有办法访问那个私有方法?
MLUtils
是内部包,即使不是那个,它也不能用于 Columns
或(从版本猜测)ml
向量。你必须自己设计 udf
:
import org.apache.spark.sql.functions._
import org.apache.spark.ml.linalg.Vector
val euclidean = udf((v1: Vector, v2: Vector) => ???) // Fill with preferred logic
val jP2 = jP.withColumn("dist", euclidean($"features", $"episodeFeatures"))