Python 2.7 JSON 转储 UnicodeEncodeError

Question

我有一个文件，其中每一行都是一个 json 对象，如下所示：

{"name": "John", ...}

{...}

我正在尝试创建一个包含相同对象的新文件，但从所有对象中删除了某些属性。

当我这样做时，我得到一个 UnicodeEncodeError。奇怪的是，如果我改为循环 range(n)（对于某个数字 n）并使用 infile.next()，它会按我想要的方式工作。

为什么会这样？我如何通过遍历 infile 使其工作？我尝试使用 dumps() 而不是 dump()，但这只会在 outfile.

中生成一堆空行

with open(filename, 'r') as infile:
    with open('_{}'.format(filename), 'w') as outfile:
        for comment in infile:
            decodedComment = json.loads(comment)
            for prop in propsToRemove:
                # use pop to avoid exception handling
                decodedComment.pop(prop, None)
            json.dump(decodedComment, outfile, ensure_ascii = False)
            outfile.write('\n')

这里是错误：

UnicodeEncodeError: 'ascii' codec can't encode character u'\U0001f47d' in position 1: ordinal not in range(128)

感谢您的帮助！

Answer 1

您面临的问题是标准 file.write() 函数（由 json.dump() 函数调用）不支持 unicode 字符串。从错误消息中可以看出，您的字符串包含 UTF 字符 \U0001f47d（结果是字符 EXTRATERRESTRIAL ALIEN 的代码，谁知道呢？），并且可能还有其他字符UTF 字符。要处理这些字符，您可以将它们编码为 ASCII 编码（它们将在输出文件中显示为 \XXXXXX），或者您需要使用可以处理 unicode 的文件编写器。

要执行第一个选项，请将您的写作行替换为以下行：

json.dump(unicode(decodedComment), outfile, ensure_ascii = False)

第二个选项可能更符合您的要求，一个简单的选项是使用 codecs 模块。导入它，并将第二行更改为：

with codecs.open('_{}'.format(filename), 'w', encoding="utf-8") as outfile:

然后，您将能够以原始形式保存特殊字符。

Python 2.7 JSON 转储 UnicodeEncodeError

Python 2.7 JSON dump UnicodeEncodeError

python

json

dump

python-2.7