spark sql: select 行，其中 Decimal 类型的列的比例大于数字

Question

我有一个数据框，其中包含数据类型为 DecimalType(38,10) 的列。并非所有值都有 10 位小数。我想 select 那些比例大于 4 的行（在删除尾随零之后）。

有办法吗？

在伪代码中类似于 ds.select(col1, col2).where(col3.hasScale >4)

Answer 1

像这样可以做到：

import org.apache.spark.sql.Row;
import org.apache.spark.sql.types.StructField;
import org.apache.spark.sql.types.StructType;
import org.apache.spark.sql.types.StringType;
import org.apache.spark.sql.types.DataTypes;

val maxScale = 10

val decimalType = DataTypes.createDecimalType(38, maxScale)

val data = Seq(
  Row(BigDecimal.decimal(3.302302)),
  Row(BigDecimal.decimal(3.4434)),
  Row(BigDecimal.decimal(4.32)),
  Row(BigDecimal.decimal(4.230240505)),
  Row(BigDecimal.decimal(7.302)),
  Row(BigDecimal.decimal(4.34444))
)

val schema = List(
  StructField("number", decimalType, true)
)

val df = spark.createDataFrame(
  spark.sparkContext.parallelize(data),
  StructType(schema)
)

df.show()

val decimalScale = udf((n: Double) => {
  Stream.range(0, maxScale + 1).map { s => 
    val multiplier = scala.math.pow(10, maxScale)
    val modulus = scala.math.pow(10, maxScale - s)
    (s, n * multiplier % modulus)
  }.find(_._2 == 0).get._1
})

df.filter(decimalScale(col("number")) > 4).show()

spark sql: select 行，其中 Decimal 类型的列的比例大于数字

spark sql: select rows where column of DecimalType has larger scale than a numer

scale

bigdecimal

apache-spark-sql