如何删除 python3 字符串对象中显示为 \uxxx 的特殊字符?
How do i remove the special chars that show as `\uxxx` in python3 string object?
python 字符串对象如下:
The site of the old observatory in Bern \u200bis the point of origin of the CH1903 coordinate system at 46°57′08.66″N 7°26′22.50″E\ufeff / \ufeff46.9524056°N 7.4395833°E\ufeff / 46.9524056; 7.4395833.
我想删除这些显示为原始 unicode 的字符 \u200b
\ufeff
。
将其编码为 ascii
并忽略错误
>>> s = 'The site of the old observatory in Bern \u200bis the point of origin of the CH1903 coordinate system at 46°57′08.66″N 7°26′22.50″E\ufeff / \ufeff46.9524056°N 7.4395833°E\ufeff / 46.9524056; 7.4395833'
>>> s.encode('ascii', 'ignore')
b'The site of the old observatory in Bern is the point of origin of the CH1903 coordinate system at 465708.66N 72622.50E / 46.9524056N 7.4395833E / 46.9524056; 7.4395833'
要用白色替换unicode字符space以保持长度不变,您可以使用
#length of original string
>>> s = 'The site of the old observatory in Bern \u200bis the point of origin of the CH1903 coordinate system at 46°57′08.66″N 7°26′22.50″E\ufeff / \ufeff46.9524056°N 7.4395833°E\ufeff / 46.9524056; 7.4395833'
>>> len(s)
179
#to maintain the same length
>>> new_s = s.encode('ascii',errors='ignore').decode('utf-8')
>>> final_s = new_s + ' ' * (len(s) - len(new_s))
>>> final_s
'The site of the old observatory in Bern is the point of origin of the CH1903 coordinate system at 465708.66N 72622.50E / 46.9524056N 7.4395833E / 46.9524056; 7.4395833 '
>>> len(final_s)
179
这将在最后添加额外的 space 以保持长度
python 字符串对象如下:
The site of the old observatory in Bern \u200bis the point of origin of the CH1903 coordinate system at 46°57′08.66″N 7°26′22.50″E\ufeff / \ufeff46.9524056°N 7.4395833°E\ufeff / 46.9524056; 7.4395833.
我想删除这些显示为原始 unicode 的字符 \u200b
\ufeff
。
将其编码为 ascii
并忽略错误
>>> s = 'The site of the old observatory in Bern \u200bis the point of origin of the CH1903 coordinate system at 46°57′08.66″N 7°26′22.50″E\ufeff / \ufeff46.9524056°N 7.4395833°E\ufeff / 46.9524056; 7.4395833'
>>> s.encode('ascii', 'ignore')
b'The site of the old observatory in Bern is the point of origin of the CH1903 coordinate system at 465708.66N 72622.50E / 46.9524056N 7.4395833E / 46.9524056; 7.4395833'
要用白色替换unicode字符space以保持长度不变,您可以使用
#length of original string
>>> s = 'The site of the old observatory in Bern \u200bis the point of origin of the CH1903 coordinate system at 46°57′08.66″N 7°26′22.50″E\ufeff / \ufeff46.9524056°N 7.4395833°E\ufeff / 46.9524056; 7.4395833'
>>> len(s)
179
#to maintain the same length
>>> new_s = s.encode('ascii',errors='ignore').decode('utf-8')
>>> final_s = new_s + ' ' * (len(s) - len(new_s))
>>> final_s
'The site of the old observatory in Bern is the point of origin of the CH1903 coordinate system at 465708.66N 72622.50E / 46.9524056N 7.4395833E / 46.9524056; 7.4395833 '
>>> len(final_s)
179
这将在最后添加额外的 space 以保持长度