如何使用 python 将 unicode 字符串转换为 unicode 格式？

Question

我是学生学习python scrapy（爬虫）

我想在 python 中将 unicode 字符串转换为 str。但是这个 unicode 字符串不是普通字符串。这个unicode是unicode格式。请看下面的代码。

# python 2.7
...
print(type(name[0]))
print(name[0])
print(type(keyword_name_temp))
print(keyword_name_temp)
...

当运行上面的脚本时，我可以看到如下所示的控制台。

$ <type 'unicode'>
$ 서용교 ## this words is korean characters
$ <type 'unicode'>
$ u'\uc9c0\ubc29\uc790\uce58\ub2e8\uccb4'

我想把 "keyword_name_temp" 看成韩语。但我不知道该怎么做...

我通过 http 请求从 html 代码中获得了名单和 keyword_name_temp。

名单基本上是字符串格式。

keyword_name_temp 基本上是 unicode 格式。

请任何人帮助我！

Answer 1

最简单的解决方案是切换到 Python 3，其中字符串默认为 Unicode。

Answer 2

你的字符串是unicode，如果你知道编码：例如utf-8，你可以试试

print name[0].decode("utf-8")

Answer 3

u'\uc9c0\ubc29\uc790\uce58\ub2e8\uccb4' 包含真正的反斜杠（反斜杠是 Python 字符串文字中的转义字符，python 解释器将字符串中的反斜杠打印为 \），然后是 u 和十六进制序列，而不是文字 Unicode 字符 U+C9C0 等。通常使用 \u escape sequence 编写（该字符串可能恰好来自某个 JSON 对象吗？）

您可以从中构造一个 JSON 字符串，然后使用 json.loads() 转换为 unicode 字符串：

Python2.7 中的示例：

>>> s1 = u'서용교'
>>> type(s1)
<type 'unicode'>
>>> s1
u'\uc11c\uc6a9\uad50'
>>> print(s1)
서용교
>>> 
>>> 
>>> s2 = u'\uc9c0\ubc29\uc790\uce58\ub2e8\uccb4'
>>> type(s2)
<type 'unicode'>
>>>
>>> # put that unicode string between double-quotes
>>> # so that json module can interpret it
>>> ts2 = u'"%s"' % s2
>>> ts2
u'"\uc9c0\ubc29\uc790\uce58\ub2e8\uccb4"'
>>>
>>> import json
>>> json.loads(ts2)
u'\uc9c0\ubc29\uc790\uce58\ub2e8\uccb4'
>>> print(json.loads(ts2))
지방자치단체
>>>

另一种选择是将其设为字符串文字

>>> import ast
>>>
>>> # construct a string literal, with the 'u' prefix
>>> s2_literal = u'u"%s"' % s2
>>> s2_literal
u'u"\uc9c0\ubc29\uc790\uce58\ub2e8\uccb4"'
>>> print(ast.literal_eval(s2_literal))
지방자치단체
>>> 
>>> # also works with single-quotes string literals
>>> s2_literal2 = u"u'%s'" % s2
>>> s2_literal2
u"u'\uc9c0\ubc29\uc790\uce58\ub2e8\uccb4'"
>>> 
>>> print(ast.literal_eval(s2_literal2))
지방자치단체
>>>

如何使用 python 将 unicode 字符串转换为 unicode 格式？

how to convert unicode string on unicode format with python?

python

string

unicode

string-conversion

scrapy