删除尾随空格、unicode 字符和特殊字符
Remove trailing white spaces,unicode characters and a special character
如何清除字符串中的空格和 python 中的特殊字符。
我正在抓取一些数据,但是我得到的文本有点乱码。我想我可以使用 join
strip
和 enconding
进行清理,但是我的输出是意外的。
#cleaner function
def string_cleaner(rouge_text):
return (" ".join(rouge_text.strip()).encode('ascii', 'ignore').decode("utf-8")).replace("\","")
print(string_cleaner("\n\t\t\t\t\t\t\t\t\t Nokia 9 PureView- 5.99\ "))
print(string_cleaner("\n\t\t\t\t\t\t\t\t\tMi Electronic Scooter\uff08Black\uff09EU\t \t\t\t\t\t\t\t\t "))
输出
How do i clean my string and get normal text?
我不确定我明白你所说的 "clean my string and get normal text" 的意思,但也许可以尝试这样使用:
def string_cleaner(rouge_text):
# "" instead of " " in .join() method
return ("".join(rouge_text.strip()).encode('ascii', 'ignore').decode("utf-8")).replace("\","")
print(string_cleaner("\n\t\t\t\t\t\t\t\t\t Nokia 9 PureView- 5.99\ "))
print(string_cleaner("\n\t\t\t\t\t\t\t\t\tMi Electronic Scooter\uff08Black\uff09EU\t \t\t\t\t\t\t\t\t "))
输出:
>>> print(string_cleaner("\n\t\t\t\t\t\t\t\t\t Nokia 9 PureView- 5.99\ "))
Nokia 9 PureView- 5.99
>>> print(string_cleaner("\n\t\t\t\t\t\t\t\t\tMi Electronic Scooter\uff08Black\uff09EU\t \t\t\t\t\t\t\t\t "))
Mi Electronic ScooterBlackEU
如何清除字符串中的空格和 python 中的特殊字符。
我正在抓取一些数据,但是我得到的文本有点乱码。我想我可以使用 join
strip
和 enconding
进行清理,但是我的输出是意外的。
#cleaner function
def string_cleaner(rouge_text):
return (" ".join(rouge_text.strip()).encode('ascii', 'ignore').decode("utf-8")).replace("\","")
print(string_cleaner("\n\t\t\t\t\t\t\t\t\t Nokia 9 PureView- 5.99\ "))
print(string_cleaner("\n\t\t\t\t\t\t\t\t\tMi Electronic Scooter\uff08Black\uff09EU\t \t\t\t\t\t\t\t\t "))
输出
How do i clean my string and get normal text?
我不确定我明白你所说的 "clean my string and get normal text" 的意思,但也许可以尝试这样使用:
def string_cleaner(rouge_text):
# "" instead of " " in .join() method
return ("".join(rouge_text.strip()).encode('ascii', 'ignore').decode("utf-8")).replace("\","")
print(string_cleaner("\n\t\t\t\t\t\t\t\t\t Nokia 9 PureView- 5.99\ "))
print(string_cleaner("\n\t\t\t\t\t\t\t\t\tMi Electronic Scooter\uff08Black\uff09EU\t \t\t\t\t\t\t\t\t "))
输出:
>>> print(string_cleaner("\n\t\t\t\t\t\t\t\t\t Nokia 9 PureView- 5.99\ "))
Nokia 9 PureView- 5.99
>>> print(string_cleaner("\n\t\t\t\t\t\t\t\t\tMi Electronic Scooter\uff08Black\uff09EU\t \t\t\t\t\t\t\t\t "))
Mi Electronic ScooterBlackEU