将前导零添加到 Spark 数据框中的列
Add leading zeros to Columns in a Spark Data Frame
简而言之,我正在利用 spark-xml 对 XML 文件进行一些解析。但是,使用它会删除我感兴趣的所有值中的前导零。但是,我需要最终输出(即 DataFrame)来包含前导零。我 unsure/can 想不出一种方法来为我感兴趣的列添加前导零。
val df = spark.read
.format("com.databricks.spark.xml")
.option("rowTag", "output")
.option("excludeAttribute", true)
.option("allowNumericLeadingZeros", true) //including this does not solve the problem
.load("pathToXmlFile")
我得到的示例输出
+------+---+--------------------+
|iD |val|Code |
+------+---+--------------------+
|1 |44 |9022070536692784476 |
|2 |66 |-5138930048185086175|
|3 |25 |805582856291361761 |
|4 |17 |-9107885086776983000|
|5 |18 |1993794295881733178 |
|6 |31 |-2867434050463300064|
|7 |88 |-4692317993930338046|
|8 |44 |-4039776869915039812|
|9 |20 |-5786627276152563542|
|10 |12 |7614363703260494022 |
+------+---+--------------------+
期望的输出
+--------+----+--------------------+
|iD |val |Code |
+--------+----+--------------------+
|001 |044 |9022070536692784476 |
|002 |066 |-5138930048185086175|
|003 |025 |805582856291361761 |
|004 |017 |-9107885086776983000|
|005 |018 |1993794295881733178 |
|006 |031 |-2867434050463300064|
|007 |088 |-4692317993930338046|
|008 |044 |-4039776869915039812|
|009 |020 |-5786627276152563542|
|0010 |012 |7614363703260494022 |
+--------+----+--------------------+
您可以使用 concat
内置函数
简单地做到这一点
df.withColumn("iD", concat(lit("00"), col("iD")))
.withColumn("val", concat(lit("0"), col("val")))
这帮我解决了,谢谢大家的帮助
val df2 = df
.withColumn("idLong", format_string("%03d", $"iD"))
简而言之,我正在利用 spark-xml 对 XML 文件进行一些解析。但是,使用它会删除我感兴趣的所有值中的前导零。但是,我需要最终输出(即 DataFrame)来包含前导零。我 unsure/can 想不出一种方法来为我感兴趣的列添加前导零。
val df = spark.read
.format("com.databricks.spark.xml")
.option("rowTag", "output")
.option("excludeAttribute", true)
.option("allowNumericLeadingZeros", true) //including this does not solve the problem
.load("pathToXmlFile")
我得到的示例输出
+------+---+--------------------+
|iD |val|Code |
+------+---+--------------------+
|1 |44 |9022070536692784476 |
|2 |66 |-5138930048185086175|
|3 |25 |805582856291361761 |
|4 |17 |-9107885086776983000|
|5 |18 |1993794295881733178 |
|6 |31 |-2867434050463300064|
|7 |88 |-4692317993930338046|
|8 |44 |-4039776869915039812|
|9 |20 |-5786627276152563542|
|10 |12 |7614363703260494022 |
+------+---+--------------------+
期望的输出
+--------+----+--------------------+
|iD |val |Code |
+--------+----+--------------------+
|001 |044 |9022070536692784476 |
|002 |066 |-5138930048185086175|
|003 |025 |805582856291361761 |
|004 |017 |-9107885086776983000|
|005 |018 |1993794295881733178 |
|006 |031 |-2867434050463300064|
|007 |088 |-4692317993930338046|
|008 |044 |-4039776869915039812|
|009 |020 |-5786627276152563542|
|0010 |012 |7614363703260494022 |
+--------+----+--------------------+
您可以使用 concat
内置函数
df.withColumn("iD", concat(lit("00"), col("iD")))
.withColumn("val", concat(lit("0"), col("val")))
这帮我解决了,谢谢大家的帮助
val df2 = df
.withColumn("idLong", format_string("%03d", $"iD"))