如何处理 pyspark 中的转义字符。试图用 NULL 替换转义字符

Question

我正在尝试用 NULL 替换 pyspark 数据框中的转义字符。数据框中的数据如下所示

Col1|Col2|Col3 
1|66|026|abcd026efg.

Col2 是垃圾数据，正在尝试用 NULL 替换。尝试用 replace 和 regex_replace 函数将 '\026' 替换为 Null 值，因为转义字符 (" \ ")，数据不会被替换为 NULL 值。

 replace(col2, "026",  'abcd') 
 replace(Col2, "6",  'abcd')

最后，

我想要我的数据

Col1|Col2|Col3 
1|NULL|026|abcd026efg.

非常感谢解决此情况的想法。

谢谢 -EVR

enter image description here

Answer 1

使用替换所有数字和前面的非数字

 import pyspark.sql.functions as F
 df.withColumn('col2',F.regexp_replace('col2','\D\d+',None)).show()

+----+----+-----------+
|col1|col2|       col3|
+----+----+-----------+
|   1|null|abcd026efg.|
+----+----+-----------+

如何处理 pyspark 中的转义字符。试图用 NULL 替换转义字符

How to handle escape characters in pyspark. Trying to replace escape character with NULL

replace

unicode-escapes

pyspark