即使将它写入 json 文件后,如何保持 unicode 字符的原始值?
How the keep the original value of unicode characters even after writing it to a json file?
我正在处理一个包含 unicode 表情符号的文件。保持原样的同时看起来还不错。我可以看到表情符号。但是当我使用 json 模块读取它并再次写入时,它将表情符号转换为这样的东西:“\ud83d\ude00”。于是我的表情符号“”写完就变成了“\ud83d\ude00”。我正在使用以下代码:
import json
with open("emoji-by-category.json", encoding='utf-8', errors='ignore') as json_data:
data = json.load(json_data, strict=False)
with open("emoji-by-category2.json", 'w', encoding='utf-8', errors='ignore') as json_file:
json.dump(data, json_file, indent=4)
这里是示例 json 文件:
[
{
"category": "Smileys & Emotion",
"section": "face-smiling",
"n": "1",
"code": "U+1F600",
"text": "\ud83d\ude00",
"recentlyAdded": false,
"name": "grinning face",
"vendors": {
"Appl": true,
"Goog": true,
"FB": true,
"Wind": true,
"Twtr": true,
"Joy": true,
"Sams": true,
"GMail": true,
"SB": false,
"DCM": false,
"KDDI": false,
"Tlgr": true
},
"tags": [
"face",
"grin",
"grinning face"
],
"keywords": [
"face",
"grin",
"grinning",
"subdivision",
"flag",
":D",
"grinning face"
]
},
{
"category": "Smileys & Emotion",
"section": "face-smiling",
"n": "2",
"code": "U+1F603",
"text": "\ud83d\ude03",
"recentlyAdded": false,
"name": "grinning face with big eyes",
"vendors": {
"Appl": true,
"Goog": true,
"FB": true,
"Wind": true,
"Twtr": true,
"Joy": true,
"Sams": true,
"GMail": true,
"SB": true,
"DCM": true,
"KDDI": true,
"Tlgr": true
},
"tags": [
"face",
"grinning face with big eyes",
"mouth",
"open",
"smile"
],
"keywords": [
"face",
"grinning",
"big",
"eyes",
"mouth",
"open",
"smile",
"subdivision",
"flag",
"grin",
"eye",
":D",
":)",
"grinning face with big eyes"
]
},
{
"category": "Smileys & Emotion",
"section": "face-smiling",
"n": "3",
"code": "U+1F604",
"text": "\ud83d\ude04",
"recentlyAdded": false,
"name": "grinning face with smiling eyes",
"vendors": {
"Appl": true,
"Goog": true,
"FB": true,
"Wind": true,
"Twtr": true,
"Joy": true,
"Sams": true,
"GMail": true,
"SB": true,
"DCM": false,
"KDDI": false,
"Tlgr": true
},
"tags": [
"eye",
"face",
"grinning face with smiling eyes",
"mouth",
"open",
"smile"
],
"keywords": [
"eye",
"face",
"grinning",
"smiling",
"eyes",
"mouth",
"open",
"smile",
"subdivision",
"flag",
"grin",
"joy",
"funny",
"haha",
"laugh",
":D",
":)",
"grinning face with smiling eyes"
]
}
]
使用
with open("emoji-by-category2.json", 'w', encoding='utf-8', errors='ignore') as json_file:
json.dump(data, json_file, indent=4, ensure_ascii=False)
阅读 JSON 编码器和解码器的文档:
json.dump(obj, fp, *, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, cls=None, indent=None, separators=None, default=None, sort_keys=False, **kw)
… If ensure_ascii
is true (the default), the output is guaranteed to
have all incoming non-ASCII characters escaped. If ensure_ascii
is
false, these characters will be output as-is. …
我正在处理一个包含 unicode 表情符号的文件。保持原样的同时看起来还不错。我可以看到表情符号。但是当我使用 json 模块读取它并再次写入时,它将表情符号转换为这样的东西:“\ud83d\ude00”。于是我的表情符号“”写完就变成了“\ud83d\ude00”。我正在使用以下代码:
import json
with open("emoji-by-category.json", encoding='utf-8', errors='ignore') as json_data:
data = json.load(json_data, strict=False)
with open("emoji-by-category2.json", 'w', encoding='utf-8', errors='ignore') as json_file:
json.dump(data, json_file, indent=4)
这里是示例 json 文件:
[
{
"category": "Smileys & Emotion",
"section": "face-smiling",
"n": "1",
"code": "U+1F600",
"text": "\ud83d\ude00",
"recentlyAdded": false,
"name": "grinning face",
"vendors": {
"Appl": true,
"Goog": true,
"FB": true,
"Wind": true,
"Twtr": true,
"Joy": true,
"Sams": true,
"GMail": true,
"SB": false,
"DCM": false,
"KDDI": false,
"Tlgr": true
},
"tags": [
"face",
"grin",
"grinning face"
],
"keywords": [
"face",
"grin",
"grinning",
"subdivision",
"flag",
":D",
"grinning face"
]
},
{
"category": "Smileys & Emotion",
"section": "face-smiling",
"n": "2",
"code": "U+1F603",
"text": "\ud83d\ude03",
"recentlyAdded": false,
"name": "grinning face with big eyes",
"vendors": {
"Appl": true,
"Goog": true,
"FB": true,
"Wind": true,
"Twtr": true,
"Joy": true,
"Sams": true,
"GMail": true,
"SB": true,
"DCM": true,
"KDDI": true,
"Tlgr": true
},
"tags": [
"face",
"grinning face with big eyes",
"mouth",
"open",
"smile"
],
"keywords": [
"face",
"grinning",
"big",
"eyes",
"mouth",
"open",
"smile",
"subdivision",
"flag",
"grin",
"eye",
":D",
":)",
"grinning face with big eyes"
]
},
{
"category": "Smileys & Emotion",
"section": "face-smiling",
"n": "3",
"code": "U+1F604",
"text": "\ud83d\ude04",
"recentlyAdded": false,
"name": "grinning face with smiling eyes",
"vendors": {
"Appl": true,
"Goog": true,
"FB": true,
"Wind": true,
"Twtr": true,
"Joy": true,
"Sams": true,
"GMail": true,
"SB": true,
"DCM": false,
"KDDI": false,
"Tlgr": true
},
"tags": [
"eye",
"face",
"grinning face with smiling eyes",
"mouth",
"open",
"smile"
],
"keywords": [
"eye",
"face",
"grinning",
"smiling",
"eyes",
"mouth",
"open",
"smile",
"subdivision",
"flag",
"grin",
"joy",
"funny",
"haha",
"laugh",
":D",
":)",
"grinning face with smiling eyes"
]
}
]
使用
with open("emoji-by-category2.json", 'w', encoding='utf-8', errors='ignore') as json_file:
json.dump(data, json_file, indent=4, ensure_ascii=False)
阅读 JSON 编码器和解码器的文档:
json.dump(obj, fp, *, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, cls=None, indent=None, separators=None, default=None, sort_keys=False, **kw)
… If
ensure_ascii
is true (the default), the output is guaranteed to have all incoming non-ASCII characters escaped. Ifensure_ascii
is false, these characters will be output as-is. …