即使将它写入 json 文件后，如何保持 unicode 字符的原始值？

Question

我正在处理一个包含 unicode 表情符号的文件。保持原样的同时看起来还不错。我可以看到表情符号。但是当我使用 json 模块读取它并再次写入时，它将表情符号转换为这样的东西：“\ud83d\ude00”。于是我的表情符号“”写完就变成了“\ud83d\ude00”。我正在使用以下代码：

import json

with open("emoji-by-category.json", encoding='utf-8', errors='ignore') as json_data:
    data = json.load(json_data, strict=False)

with open("emoji-by-category2.json", 'w', encoding='utf-8', errors='ignore') as json_file:
    json.dump(data, json_file, indent=4)

这里是示例 json 文件：

[
    {
        "category": "Smileys & Emotion",
        "section": "face-smiling",
        "n": "1",
        "code": "U+1F600",
        "text": "\ud83d\ude00",
        "recentlyAdded": false,
        "name": "grinning face",
        "vendors": {
            "Appl": true,
            "Goog": true,
            "FB": true,
            "Wind": true,
            "Twtr": true,
            "Joy": true,
            "Sams": true,
            "GMail": true,
            "SB": false,
            "DCM": false,
            "KDDI": false,
            "Tlgr": true
        },
        "tags": [
            "face",
            "grin",
            "grinning face"
        ],
        "keywords": [
            "face",
            "grin",
            "grinning",
            "subdivision",
            "flag",
            ":D",
            "grinning face"
        ]
    },
    {
        "category": "Smileys & Emotion",
        "section": "face-smiling",
        "n": "2",
        "code": "U+1F603",
        "text": "\ud83d\ude03",
        "recentlyAdded": false,
        "name": "grinning face with big eyes",
        "vendors": {
            "Appl": true,
            "Goog": true,
            "FB": true,
            "Wind": true,
            "Twtr": true,
            "Joy": true,
            "Sams": true,
            "GMail": true,
            "SB": true,
            "DCM": true,
            "KDDI": true,
            "Tlgr": true
        },
        "tags": [
            "face",
            "grinning face with big eyes",
            "mouth",
            "open",
            "smile"
        ],
        "keywords": [
            "face",
            "grinning",
            "big",
            "eyes",
            "mouth",
            "open",
            "smile",
            "subdivision",
            "flag",
            "grin",
            "eye",
            ":D",
            ":)",
            "grinning face with big eyes"
        ]
    },
    {
        "category": "Smileys & Emotion",
        "section": "face-smiling",
        "n": "3",
        "code": "U+1F604",
        "text": "\ud83d\ude04",
        "recentlyAdded": false,
        "name": "grinning face with smiling eyes",
        "vendors": {
            "Appl": true,
            "Goog": true,
            "FB": true,
            "Wind": true,
            "Twtr": true,
            "Joy": true,
            "Sams": true,
            "GMail": true,
            "SB": true,
            "DCM": false,
            "KDDI": false,
            "Tlgr": true
        },
        "tags": [
            "eye",
            "face",
            "grinning face with smiling eyes",
            "mouth",
            "open",
            "smile"
        ],
        "keywords": [
            "eye",
            "face",
            "grinning",
            "smiling",
            "eyes",
            "mouth",
            "open",
            "smile",
            "subdivision",
            "flag",
            "grin",
            "joy",
            "funny",
            "haha",
            "laugh",
            ":D",
            ":)",
            "grinning face with smiling eyes"
        ]
    }
]

Answer 1

使用

with open("emoji-by-category2.json", 'w', encoding='utf-8', errors='ignore') as json_file:
    json.dump(data, json_file, indent=4, ensure_ascii=False)

阅读 JSON 编码器和解码器的文档：

Basic Usage:

json.dump(obj, fp, *, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, cls=None, indent=None, separators=None, default=None, sort_keys=False, **kw)
… If ensure_ascii is true (the default), the output is guaranteed to have all incoming non-ASCII characters escaped. If ensure_ascii is false, these characters will be output as-is. …

即使将它写入 json 文件后，如何保持 unicode 字符的原始值？

How the keep the original value of unicode characters even after writing it to a json file?

python

unicode

file-io

json

python-3.x