如何关闭 Spark 中的舍入？

Question

例如，我有一个数据框，我正在这样做：

df = dataframe.withColumn("test", lit(0.4219759403))

我只想得到点后的前四个数字，不四舍五入。

当我使用 .cast(DataTypes.createDecimalType(20,4) 转换为 DecimalType 时甚至使用 round 函数，这个数字四舍五入为 0.4220.

我发现不进行舍入的唯一方法是应用函数 format_number()，但是这个函数给了我一个字符串，当我转换这个 string to DecimalType(20,4) 时，框架将数字舍入再次 0.4220.

我需要将此数字转换为 enter code hereDecimalType(20,4) 而不四舍五入，我希望看到 0.4219.

我该怎么做？

Answer 1

您好，欢迎来到 Whosebug，
请下次尝试使用您尝试过的代码提供一个可重现的示例，无论如何这对我有用：

from pyspark.sql.types import DecimalType
df = spark.createDataFrame([
    (1, "a"),
    (2, "b"),
    (3, "c"),
], ["ID", "Text"])

df = df.withColumn("test", lit(0.4219759403))
df = df.withColumn("test_string", F.substring(df["test"].cast("string"), 0, 6))
df = df.withColumn("test_string_decimaltype", df["test_string"].cast(DecimalType(20,4)))
df.show()
df.printSchema()

+---+----+------------+-----------+-----------------------+
| ID|Text|        test|test_string|test_string_decimaltype|
+---+----+------------+-----------+-----------------------+
|  1|   a|0.4219759403|     0.4219|                 0.4219|
|  2|   b|0.4219759403|     0.4219|                 0.4219|
|  3|   c|0.4219759403|     0.4219|                 0.4219|
+---+----+------------+-----------+-----------------------+

root
 |-- ID: long (nullable = true)
 |-- Text: string (nullable = true)
 |-- test: double (nullable = false)
 |-- test_string: string (nullable = false)
 |-- test_string_decimaltype: decimal(20,4) (nullable = true)

当然，如果您愿意，可以通过始终放置“测试”来覆盖同一列，我选择不同的名称让您看到步骤。

Answer 2

如果您有小数点前超过一位的数字，则substr不适用。相反，您可以使用正则表达式始终提取前 4 个十进制数字（如果存在）。
您可以使用 regexp_extract

df = dataframe.withColumn('rounded', F.regexp_extract(F.col('test'), '\d+\.\d{0,4}', 0))

示例

import pyspark.sql.functions as F

dataframe = spark.createDataFrame([
    (0.4219759403, ),
    (0.4, ),
    (1.0, ),
    (0.5431293, ),
    (123.769859, )
], ['test'])
df = dataframe.withColumn('rounded', F.regexp_extract(F.col('test'), '\d+\.\d{0,4}', 0))
df.show()

+------------+--------+
|        test| rounded|
+------------+--------+
|0.4219759403|  0.4219|
|         0.4|     0.4|
|         1.0|     1.0|
|   0.5431293|  0.5431|
|  123.769859|123.7698|
+------------+--------+

如何关闭 Spark 中的舍入？

How can I turn off rounding in Spark?

python

rounding

dataframe

apache-spark

pyspark