如何重新调整 spark/scala 中移动中心的数字范围?

How to rescale range of numbers shifting the centre in spark/scala?

spark 中的哪个函数可以将 -infinity+infinity-2130 等范围内的值转换/重新调整为要定义的最大值。

在下面的例子中,我想确保 55 是 100,而 100+ 是 0

before | after

45-55 | 90-100

35-44 | 80-89

...

100+ or < 0| 0-5

任何 ML features functions 有用吗?

我已经解决了,感谢@user6910411 的帮助。 您可以根据数据使用密集或稀疏向量,并将 MinMaxScaler 替换为 MaxAbsScaler 并使用 linalg.VectorsDenseVector 提取值 想法是在需要的中位数和反向缩放一半的点上拆分数据,然后缩放两半并合并 DF。

import org.apache.spark.mllib.linalg.Vectors
import org.apache.spark.ml.feature.Normalizer
import org.apache.spark.ml.feature.MaxAbsScaler
import org.apache.spark.ml.feature.MinMaxScaler
import org.apache.spark.ml.feature.VectorAssembler
import org.apache.spark.ml.linalg.DenseVector
import org.apache.spark.sql.functions.udf

val vectorToColumn = udf{ (x: DenseVector, index: Int) => x(index) }

val gt50 = df.filter("score >= 55").select('id,('score * -1).as("score"))
val lt50 = df.filter("score < 55")

val assembler = new VectorAssembler()
.setInputCols(Array("score"))
.setOutputCol("features")

val ass_lt50 = assembler.transform(lt50)
val ass_gt50 = assembler.transform(gt50)

val scaler = new MinMaxScaler()
.setInputCol("features")
.setOutputCol("featuresScaled")
.setMax(100)
.setMin(0)

val feat_lt50 = scaler.fit(ass_lt50).transform(ass_lt50).drop('score)
val feat_gt50 = scaler.fit(ass_gt50).transform(ass_gt50).drop('score)

val scaled_lt50 = feat_lt50.select('id,round(
vectorToColumn(col("featuresScaled"),lit(0))).as("scaled_score"))

val scaled_gt50 = feat_gt50.select('id,round(
vectorToColumn(col("featuresScaled"),lit(0))).as("scaled_score"))

val scaled = scaled_lt50.unionAll(scaled_gt50)