Spark 数据集 - 将查询聚合到 BigInt 总和为零
Spark Dataset -Aggregate query to sum of BigInt sums as zero
我有一个 ExpenseEntry 类型的数据集。 ExpenseEntry 是一个基本的数据结构,用于跟踪每个 category
花费的 amount
case class ExpenseEntry(
name: String,
category: String,
amount: BigDecimal
)
示例值 -
ExpenseEntry("John", "candy", 0.5)
ExpenseEntry("Tia", "game", 0.25)
ExpenseEntry("John", "candy", 0.15)
ExpenseEntry("Tia", "candy", 0.55)
预期答案是,
category - name - amount
candy - John - 0.65
candy - Tia - 0.55
game - Tia - 0.25
我想要做的是,获取每个名称的每个原因花费的总金额。所以,我有以下数据集查询
dataset.groupBy("category", "name").agg(sum("amount"))
从理论上讲,这个问题对我来说似乎是正确的。但是,总和显示为 0E-18
而 0。我猜测金额正在 sum
函数内转换为 int
。如何将其转换为 BigInt?我对这个问题的理解正确吗?
package spark
import org.apache.spark.sql.{DataFrame, SparkSession}
object SumBig extends App{
val spark = SparkSession.builder()
.master("local")
.appName("DataFrame-example")
.getOrCreate()
import spark.implicits._
case class ExpenseEntry(
name: String,
category: String,
amount: BigDecimal
)
val df = Seq(
ExpenseEntry("John", "candy", 0.5),
ExpenseEntry("Tia", "game", 0.25),
ExpenseEntry("John", "candy", 0.15),
ExpenseEntry("Tia", "candy", 0.55)
).toDF()
df.show(false)
val r = df.groupBy("category", "name").sum("amount")
r.show(false)
// +--------+----+--------------------+
// |category|name|sum(amount) |
// +--------+----+--------------------+
// |game |Tia |0.250000000000000000|
// |candy |John|0.650000000000000000|
// |candy |Tia |0.550000000000000000|
// +--------+----+--------------------+
}
- 您可以使用 bound() 来限制小数位数
- Sum 不会将列的数据类型从 decimal 更改为 int。
df.groupBy("category", "name").agg( sum(bround( col("amount"),2) ).as("sum_amount")).show()
我有一个 ExpenseEntry 类型的数据集。 ExpenseEntry 是一个基本的数据结构,用于跟踪每个 category
amount
case class ExpenseEntry(
name: String,
category: String,
amount: BigDecimal
)
示例值 -
ExpenseEntry("John", "candy", 0.5)
ExpenseEntry("Tia", "game", 0.25)
ExpenseEntry("John", "candy", 0.15)
ExpenseEntry("Tia", "candy", 0.55)
预期答案是,
category - name - amount
candy - John - 0.65
candy - Tia - 0.55
game - Tia - 0.25
我想要做的是,获取每个名称的每个原因花费的总金额。所以,我有以下数据集查询
dataset.groupBy("category", "name").agg(sum("amount"))
从理论上讲,这个问题对我来说似乎是正确的。但是,总和显示为 0E-18
而 0。我猜测金额正在 sum
函数内转换为 int
。如何将其转换为 BigInt?我对这个问题的理解正确吗?
package spark
import org.apache.spark.sql.{DataFrame, SparkSession}
object SumBig extends App{
val spark = SparkSession.builder()
.master("local")
.appName("DataFrame-example")
.getOrCreate()
import spark.implicits._
case class ExpenseEntry(
name: String,
category: String,
amount: BigDecimal
)
val df = Seq(
ExpenseEntry("John", "candy", 0.5),
ExpenseEntry("Tia", "game", 0.25),
ExpenseEntry("John", "candy", 0.15),
ExpenseEntry("Tia", "candy", 0.55)
).toDF()
df.show(false)
val r = df.groupBy("category", "name").sum("amount")
r.show(false)
// +--------+----+--------------------+
// |category|name|sum(amount) |
// +--------+----+--------------------+
// |game |Tia |0.250000000000000000|
// |candy |John|0.650000000000000000|
// |candy |Tia |0.550000000000000000|
// +--------+----+--------------------+
}
- 您可以使用 bound() 来限制小数位数
- Sum 不会将列的数据类型从 decimal 更改为 int。
df.groupBy("category", "name").agg( sum(bround( col("amount"),2) ).as("sum_amount")).show()