ijson：如何使用 ijson 检索 dict/list 元素（从文件或字符串）？

Question

我正在尝试使用 ijson 从 json dict 对象中检索元素。

json 字符串在文件中，该文件中唯一的内容是内容：

{"categoryTreeId":"0","categoryTreeVersion":"127","categoryAspects":[1,2,3]}

（该字符串非常简化但实际上超过 2GB 长）

我需要帮忙做以下事情：

1/ 打开该文件并

2/ 使用 ijson 将 json 数据加载到某个对象中

3/ 从该对象中检索列表“[1,2,3]”

为什么不使用下面的简单代码：

my_json = json.loads('{"categoryTreeId":"0","categoryTreeVersion":"127","categoryAspects":[1,2,3]}')
my_list = my_json['categoryAspects']

好吧，你必须想象这个“[1,2,3]”列表实际上超过 2GB 长，所以使用 json.loads() 将不起作用（它只会崩溃）。

我尝试了很多组合（很多），但都失败了以下是我尝试过的一些例子

ij = ijson.items(fd,'') -> 这不会给出任何错误，下面的做

my_list = ijson.items(fd,'').next()
-> error = '_yajl2.items' 对象没有属性 'next'

my_list = ijson.items(fd,'').items()
-> error = '_yajl2.items' 对象没有属性 'items'

my_list = ij['categoryAspects']
-> 错误 = '_yajl2.items' 对象不可订阅

Answer 1

这应该有效：

with open('your_file.json', 'b') as f:
    for n in ijson.items(f, 'categoryAspects.item'):
        print(n)

此外，如果您知道您的数字是一种“普通数字”，您还可以将 use_float=True 作为额外参数传递给 items 以提高速度（ijson.items(f, 'categoryAspects.item', use_float=True) 在上面的代码）——有关它的更多详细信息，请参见 documentation.

编辑：回答进一步的问题：简单地得到一个包含所有数字的列表，你可以直接从 items 函数中创建一个，如下所示：

with open('your_file.json', 'b') as f:
    numbers = list(ijson.items(f, 'categoryAspects.item'))

请注意，如果数字太多，您可能仍然运行内存不足，这违背了进行流式分析的目的。

EDIT2：使用列表的另一种方法是创建一个包含所有数字的 numpy 数组，这应该一次在内存中给出所有数字的更紧凑的表示，以防万一：

with open('your_file.json', 'b') as f:
    numbers = numpy.fromiter(
                ijson.items(f, 'categoryAspects.item', use_float=True),
                dtype='float' # or int, if these are integers
              )

ijson：如何使用 ijson 检索 dict/list 元素（从文件或字符串）？

ijson : How to use ijson to retrieve a dict/list element (from file or from string)?

python

json

ijson