如何在 Pyspark 中将数据框的 Float 类型列定界为不超过 1 位小数?

How can I delimit a Float type column of a dataframe to have no more than 1 decimal in Pyspark?

我正在使用一个数据框,其中有一列 'Col' 类型为 Float。列的值有太多小数(例如:1.00000000000111)。如何限制列以保存只有一位小数的值(例如:1.0)?

看看这个:

import pandas as pd

df = pd.DataFrame([4.5678,5,1.00000000000111], columns=['Col'])
s = df['Col'].round(1)
print(s)

0       4.6
1       5.0
2       1.0

您可以从函数中使用 round,

+----------------+
|             Col|
+----------------+
|1.00000000000111|
|     1.000000011|
+----------------+
>>> from pyspark.sql import functions as F
>>> df = df.withColumn('Col',F.round('Col',1))
>>> df.show()
+---+
|Col|
+---+
|1.0|
|1.0|
+---+

您可以在pyspark.sql.functions中使用roundceilfloor函数(取决于您希望如何限制数字)

例如:

import pyspark.sql.functions as F

# assuming df is your dataframe and float_column_name is the name of the
# column with type FloatType, replace the column that has floats with 
# the column that has rounded floats:
df = df.withColumn('float_column_name', F.round('float_column_name', 2))