仅删除负值的前导零
Remove leading zeros for negative values only
我有一个数据框,只需要为负值类型删除前导零,其余值相同。
例如
+-----------+-----------------+
| Input |output |
+-----------+-----------------+
| 0000-12.45| -12.45 |
| 000012.45 | 000012.45 |
| 000$.00| 000$.00|
| 0$ | 0$ |
| 0. | 0. |
| 51.46 | 51.46 |
| -123.67 | -123.67 |
| 00012.45 | 00012.45|
| 012.45 | 012.45 |
下面的方法我试过了
spark.sql("""select regexp_replace("0000-12.45","^0+-(?!$)",'') as d,regexp_replace("000012.45","^0+-(?!$)",'') as d1,regexp_replace("0000.45","^0+-(?!$)",'') as d2,regexp_replace("0000$.00","^0+-(?!$)",'') as d3,regexp_replace("0.","^0+-(?!$)",'') as d4,regexp_replace("0$","^0+-(?!$)",'') as d5,regexp_replace("00","^0+-(?!$)",'') as d6,regexp_replace("51.46","^0+-(?!$)",'') as d7,regexp_replace("-12234.45","^0+-(?!$)",'') as d8, regexp_replace("0000-12234.45","^0+-(?!$)",'') as d9""").show()
+-----+---------+-------+--------+---+---+---+-----+---------+--------+
| d| d1| d2| d3| d4| d5| d6| d7| d8| d9|
+-----+---------+-------+--------+---+---+---+-----+---------+--------+
|12.45|000012.45|0000.45|0000$.00| 0.| 0$| 00|51.46|-12234.45|12234.45|
+-----+---------+-------+--------+---+---+---+-----+---------+--------+
您可以添加条件以仅在包含负号时删除零
(df
.withColumn('ouput', F
.when(F.col('input').contains('-'), F.regexp_replace('input', '^0+', ''))
.otherwise(F.col('input'))
)
.show()
)
# +----------+---------+
# | input| ouput|
# +----------+---------+
# |0000-12.45| -12.45|
# | 000012.45|000012.45|
# | 000$.00| 000$.00|
# | 0$| 0$|
# | 0.| 0.|
# | 51.46| 51.46|
# | -123.67| -123.67|
# | 00012.45| 00012.45|
# | 012.45| 012.45|
# +----------+---------+
我有一个数据框,只需要为负值类型删除前导零,其余值相同。
例如
+-----------+-----------------+
| Input |output |
+-----------+-----------------+
| 0000-12.45| -12.45 |
| 000012.45 | 000012.45 |
| 000$.00| 000$.00|
| 0$ | 0$ |
| 0. | 0. |
| 51.46 | 51.46 |
| -123.67 | -123.67 |
| 00012.45 | 00012.45|
| 012.45 | 012.45 |
下面的方法我试过了
spark.sql("""select regexp_replace("0000-12.45","^0+-(?!$)",'') as d,regexp_replace("000012.45","^0+-(?!$)",'') as d1,regexp_replace("0000.45","^0+-(?!$)",'') as d2,regexp_replace("0000$.00","^0+-(?!$)",'') as d3,regexp_replace("0.","^0+-(?!$)",'') as d4,regexp_replace("0$","^0+-(?!$)",'') as d5,regexp_replace("00","^0+-(?!$)",'') as d6,regexp_replace("51.46","^0+-(?!$)",'') as d7,regexp_replace("-12234.45","^0+-(?!$)",'') as d8, regexp_replace("0000-12234.45","^0+-(?!$)",'') as d9""").show()
+-----+---------+-------+--------+---+---+---+-----+---------+--------+
| d| d1| d2| d3| d4| d5| d6| d7| d8| d9|
+-----+---------+-------+--------+---+---+---+-----+---------+--------+
|12.45|000012.45|0000.45|0000$.00| 0.| 0$| 00|51.46|-12234.45|12234.45|
+-----+---------+-------+--------+---+---+---+-----+---------+--------+
您可以添加条件以仅在包含负号时删除零
(df
.withColumn('ouput', F
.when(F.col('input').contains('-'), F.regexp_replace('input', '^0+', ''))
.otherwise(F.col('input'))
)
.show()
)
# +----------+---------+
# | input| ouput|
# +----------+---------+
# |0000-12.45| -12.45|
# | 000012.45|000012.45|
# | 000$.00| 000$.00|
# | 0$| 0$|
# | 0.| 0.|
# | 51.46| 51.46|
# | -123.67| -123.67|
# | 00012.45| 00012.45|
# | 012.45| 012.45|
# +----------+---------+