从文件中读取包含十六进制字节字符串字符的字符串并解码?

Read str from file contain hex bytes str character and decode?

我有一个文件 example.log,其中包含:

<POOR_IN200901UV xmlns="urn:hl7-org:v3" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" ITSVersion="XML_1.0"
xsi:schemaLocation="urn:hl7-org:v3
../../Schemas/POOR_IN200901UV20.xsd">\n\t<!-- \xe6\xb6\x88\xe6\x81\xafID -
->\n\t<id extension="BS002"/>

我想读取文件并将str转换为utf-8编码格式并写入新文件。目前我的代码如下:

with open("example_decoded.log", 'w') as f:
    for line in open("example.log", 'r', encoding='utf-8'):
        m = re.search("<POOR_IN200901UV", line)
        if m:
            line = line[m.start():-2]
            line_bytes = bytes(line, encoding='raw_unicode_escape')
            line_decoded = line_bytes.decode('utf-8')
            print(line_decoded)
            f.write(line_decoded)
        else:
            pass

但是 example_decoded.log 的内容:

<POOR_IN200901UV xmlns="urn:hl7-org:v3"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" ITSVersion="XML_1.0" 
xsi:schemaLocation="urn:hl7-org:v3 
../../Schemas/POOR_IN200901UV20.xsd">\n\t<!-- \xe6\xb6\x88\xe6\x81\xafID -
->\n\t<id extension="BS002"

\xe6\xb6\x88\xe6\x81\xaf 部分没有被解码,所以我想知道如何处理这个 mix-type str 解码问题?

decodedVal = struct.unpack(">f", bytes.fromhex(encdoded_val))[0]

参考下面 link 添加您的字节序并键入而不是 ">f"

https://docs.python.org/3/library/struct.html

import codecs

decode_hex = codecs.getdecoder("hex_codec")

string = decode_hex(string)[0]

https://docs.python.org/3/library/codecs.html

参考这个:Read hex characters and convert them to utf-8 using python 3

解决方法是:

with open("example_decoded.log", 'w') as f:
    for line in open("example.log", 'r', encoding='utf-8'):
    m = re.search("<POOR_IN200901UV", line)
    if m:
        line = line[m.start():-2]
        line_decoded = bytes(line, 'utf-8').decode('unicode_escape').encode('latin-1').decode('utf8')
        print(line_decoded)
        f.write(line_decoded)
    else:
        pass

虽然我不明白为什么encode('latin-1')首先,
有人可以解释一下吗?