URL 分量 % 和 \x

Question

我有疑问。

st = "b%C3%BCrokommunikation"
urllib2.unquote(st)

输出：'b\xc3\xbcrokommunikation' 但是，如果我打印它：

print urllib2.unquote(st)

输出：bürokommunikation

为什么不同？我必须将 bürokommunikation 而不是 'b\xc3\xbcrokommunikation' 写入文件。

我的问题是：我有很多具有从 URL 中提取的值的数据。我必须将它们存储为例如。 bürokommunikation 到一个文本文件。

Answer 1

您正在查看相同的结果。当您尝试不使用 print 命令打印它时，它只显示 __repr__() 结果。当您使用 print 时，它显示 unicode 字符而不是使用 \x

转义它

Answer 2

当您 print 字符串时，您的终端仿真器会识别 unicode 字符 \xc3\xbc 并正确显示它。

但是，正如@MarkDickinson 在评论中所说，ASCII 中不存在 ü，因此您需要告诉 Python 您要写入文件的字符串是unicode编码，以及你想使用什么编码格式，比如UTF-8.

使用 codecs 库非常容易：

import codecs

# First create a Python UTF-8 string
st = "b%C3%BCrokommunikation"
encoded_string = urllib2.unquote(st).decode('utf-8')

# Write it to file keeping the encoding
with codecs.open('my_file.txt', 'w', 'utf-8') as f:
    f.write(encoded_string)

URL 分量 % 和 \x

URL component % and \x

python

urllib

urllib2