执行结束时的编码问题

Question

我的脚本存在编码问题。这是我的脚本：

def parse_airfields():
    html = urlopen('https://www.sia.aviation-civile.gouv.fr/aip/enligne/FRANCE/AIRAC-2015-09-17/html/eAIP/FR-AD-1.3-fr-FR.html').read()
    html = html.decode('utf-8')
    soup = BeautifulSoup(html, 'lxml')
    
    # A lot of work [....]

    return airfields


if __name__ == '__main__':
    airfields = parse_airfields()

    for airfield in airfields:
        for value in airfield.values():
            if isinstance(value, str):
                value.encode('utf-8')

    with open('airfields.json', 'w') as airfields_file:
        json.dump(airfields, airfields_file, indent=4, sort_keys=True)

我在没有 encode() 和 decode() 的情况下尝试过，但我得到了相同的结果...我的 JSON 文件中的编码问题：

为什么？感谢您的帮助！

Answer 1

str.encode 和 bytes.decode 不要就地修改值；你没有分配 value.encode('utf-8') 的 return 值，所以你实际上没有改变任何东西。当然，我不认为你真的想要； json 模块使用文本 (str)，而不是二进制数据 (bytes)。

问题是严格的 JSON 通常不在其字符串中包含非 ASCII 字符；它使用转义符，例如\u00b0。 Python 将直接输出 utf-8 如果你告诉它，只需将 ensure_ascii=False 添加到你的 json.dump(...) 调用的参数中。

执行结束时的编码问题

Encoding problems at the end of the execution

python

encoding

json

beautifulsoup

python-3.x