从字符串中删除 \u？

Question

我在列表中有几个词属于 '\uword' 类型。我想用空字符串替换 '\u' 。我环顾四周，但到目前为止对我没有任何帮助。我尝试使用 "%r"%word 转换为原始字符串，但这没有用。我也尝试使用 word.encode('unicode-escape') 但没有得到任何结果。有什么想法吗？

编辑

添加代码

word = '\u2019'
word.encode('unicode-escape')
print(word) # error

word = '\u2019'
word = "%r"%word
print(word) # error

Answer 1

因为您在编码和 unicode 方面遇到问题，所以了解您正在使用的 python 版本会很有帮助。我不知道我是否理解你的意思，但这应该可以解决问题：

string = r'\uword'
string.replace(r'\u','')

Answer 2

如果我没理解错的话，你可以不用正则表达式。试试看：

>>> # string = '\u2019'
>>> char = string.decode('unicode-escape')
>>> print format(ord(char), 'x')
2019

Answer 3

我错误地假设字符串的 .encode 方法会像列表的 .sort() 方法一样就地修改字符串。但是根据文档

The opposite method of bytes.decode() is str.encode(), which returns a bytes representation of the Unicode string, encoded in the requested encoding.

def remove_u(word):
    word_u = (word.encode('unicode-escape')).decode("utf-8", "strict")
    if r'\u' in word_u: 
        # print(True)
        return word_u.split('\u')[1]
    return word

vocabulary_ = [remove_u(each_word) for each_word in vocabulary_]

Answer 4

鉴于您只处理字符串。我们可以使用字符串函数.

简单地将它转换为字符串

>>> string = u"your string"
>>> string
u'your string'
>>> str(string)
'your string'

猜猜这个就可以了！

从字符串中删除 \u？

Remove \u from string?

python

regex

unicode-escapes

python-unicode