AWS Glue 3.0 的 Cast Issue - Pyspark
Cast Issue with AWS Glue 3.0 - Pyspark
我正在使用 Glue 3.0
data = [("Java", "6241499.16943521594684385382059800664452")]
rdd = spark.sparkContext.parallelize(data)
df = rdd.toDF()
df.show()
df.select(f.col("_2").cast("decimal(15,2)")).show()
我得到以下结果
+----+--------------------+
| _1| _2|
+----+--------------------+
|Java|6241499.169435215...|
+----+--------------------+
+----+
| _2|
+----+
|null|
+----+
本地 pyspark= "==3.2.1"
将字符串转换为 decimal()
没有问题,但 Glue
作业无法做到这一点
问题与AWS Glue有关!为了遇到这种情况,我曾经在执行 cast
之前转换我的字符串
def prepareStringDecimal(str_):
"""
Pyspark UDF
:param str_: "1234.123456789"
:return: 1234.12345
"""
arr = str(str_).split(".")
if len(arr) > 1:
return arr[0] + "." + arr[1][:5]
else:
return str_
# convert function to UDF
convertUDF = udf(lambda z: prepareStringDecimal(z), StringType())
data = [("Java", "6241499.16943521594684385382059800664452")]
df = spark.sparkContext.parallelize(data).toDF()
df.show()
df.select(convertUDF(f.col("_2")).cast("decimal(15,2)")).show()
输出
+----+--------------------+
| _1| _2|
+----+--------------------+
|Java|6241499.169435215...|
+----+--------------------+
+-----------------------------------+
|CAST(<lambda>(_2) AS DECIMAL(15,2))|
+-----------------------------------+
| 6241499.17|
+-----------------------------------+
注意:显然!我们可以使用 Spark SQL Functions 代替
我正在使用 Glue 3.0
data = [("Java", "6241499.16943521594684385382059800664452")]
rdd = spark.sparkContext.parallelize(data)
df = rdd.toDF()
df.show()
df.select(f.col("_2").cast("decimal(15,2)")).show()
我得到以下结果
+----+--------------------+
| _1| _2|
+----+--------------------+
|Java|6241499.169435215...|
+----+--------------------+
+----+
| _2|
+----+
|null|
+----+
本地 pyspark= "==3.2.1"
将字符串转换为 decimal()
没有问题,但 Glue
作业无法做到这一点
问题与AWS Glue有关!为了遇到这种情况,我曾经在执行 cast
def prepareStringDecimal(str_):
"""
Pyspark UDF
:param str_: "1234.123456789"
:return: 1234.12345
"""
arr = str(str_).split(".")
if len(arr) > 1:
return arr[0] + "." + arr[1][:5]
else:
return str_
# convert function to UDF
convertUDF = udf(lambda z: prepareStringDecimal(z), StringType())
data = [("Java", "6241499.16943521594684385382059800664452")]
df = spark.sparkContext.parallelize(data).toDF()
df.show()
df.select(convertUDF(f.col("_2")).cast("decimal(15,2)")).show()
输出
+----+--------------------+
| _1| _2|
+----+--------------------+
|Java|6241499.169435215...|
+----+--------------------+
+-----------------------------------+
|CAST(<lambda>(_2) AS DECIMAL(15,2))|
+-----------------------------------+
| 6241499.17|
+-----------------------------------+
注意:显然!我们可以使用 Spark SQL Functions 代替