如何使用 Python 3.7.4 将 ISO-8859-1 转换为 UTF-8
How to convert ISO-8859-1 to UTF-8 using Python 3.7.4
如何使用 Python 3.7.4(32 位)将 ISO-8859-1/latin1 中的文本转换为 UTF-8?
这是我试过的:
>>> inputText = "\xC4pple"
>>> inputText.decode('iso-8859-1').encode('utf8')
它返回了这个错误:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'str' object has no attribute 'decode'
我做错了什么?
decode
是 bytes
类型的成员:
>>> help(bytes.decode)
Help on method_descriptor:
decode(self, /, encoding='utf-8', errors='strict')
Decode the bytes using the codec registered for encoding.
encoding
The encoding with which to decode the bytes.
errors
The error handling scheme to use for the handling of decoding errors.
The default is 'strict' meaning that decoding errors raise a
UnicodeDecodeError. Other possible values are 'ignore' and 'replace'
as well as any other name registered with codecs.register_error that
can handle UnicodeDecodeErrors.
因此 inputText 需要是 bytes
类型,而不是 str
:
>>> inputText = b"\xC4pple"
>>> inputText.decode('iso-8859-1')
'Äpple'
>>> inputText.decode('iso-8859-1').encode('utf8')
b'\xc3\x84pple'
请注意 decode
的结果类型为 str
,encode
的结果类型为 bytes
。
如何使用 Python 3.7.4(32 位)将 ISO-8859-1/latin1 中的文本转换为 UTF-8?
这是我试过的:
>>> inputText = "\xC4pple"
>>> inputText.decode('iso-8859-1').encode('utf8')
它返回了这个错误:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'str' object has no attribute 'decode'
我做错了什么?
decode
是 bytes
类型的成员:
>>> help(bytes.decode)
Help on method_descriptor:
decode(self, /, encoding='utf-8', errors='strict')
Decode the bytes using the codec registered for encoding.
encoding
The encoding with which to decode the bytes.
errors
The error handling scheme to use for the handling of decoding errors.
The default is 'strict' meaning that decoding errors raise a
UnicodeDecodeError. Other possible values are 'ignore' and 'replace'
as well as any other name registered with codecs.register_error that
can handle UnicodeDecodeErrors.
因此 inputText 需要是 bytes
类型,而不是 str
:
>>> inputText = b"\xC4pple"
>>> inputText.decode('iso-8859-1')
'Äpple'
>>> inputText.decode('iso-8859-1').encode('utf8')
b'\xc3\x84pple'
请注意 decode
的结果类型为 str
,encode
的结果类型为 bytes
。