以 python 的块大小读取 JSON 文件时出错
Error while reading JSON file in chunksizes with python
我有一个很大的 json 文件,所以我想在测试时分块读取文件。我已经实现了下面的代码:
if fpath.endswith('.json'):
with open(fpath, 'r') as f:
read_query = pd.read_json(f, lines=True, chunksize=100)
for chunk in read_query:
print(chunk)
我收到错误:
File "nameoffile.py", line 168, in read_queries_func
for chunk in read_query:
File "C:\Users\Me\Python38\lib\site-packages\pandas\io\json\_json.py", line 798, in __next__
obj = self._get_object_parser(lines_json)
File "C:\Users\Me\Python38\lib\site-packages\pandas\io\json\_json.py", line 770, in _get_object_parser
obj = FrameParser(json, **kwargs).parse()
File "C:\Users\Me\Python38\lib\site-packages\pandas\io\json\_json.py", line 885, in parse
self._parse_no_numpy()
File "C:\Users\Me\Python38\lib\site-packages\pandas\io\json\_json.py", line 1159, in _parse_no_numpy
loads(json, precise_float=self.precise_float), dtype=None
ValueError: Expected object or value
为什么我会收到错误消息?
JSON 文件如下所示:
[
{
"a": "13",
"b": "55"
},
{
"a": "15",
"b": "16"
},
{
"a": "18",
"b": "45"
},
{
"a": "1650",
"b": "26"
},
.
.
.
{
"a": "214",
"b": "23"
}
]
此外,有没有办法在读取文件时仅提取 'a' 属性的值?还是只能在我阅读文件后才能完成?
您的 json 文件只包含一个对象。根据 line-delimited json doc to which the doc of the chunksize
argument 点:
pandas is able to read and write line-delimited json files that are common in data processing pipelines using Hadoop or Spark.
For line-delimited json files, pandas can also return an iterator which reads in chunksize lines at a time. This can be useful for large files or to read from a stream.
这也意味着 lines=True
,行的文档说:
Read the file as a json object per line.
这意味着这样的文件有效:
{"a": 1, "b": 2}
{"a": 3, "b": 4}
{"a": 5, "b": 6}
{"a": 7, "b": 8}
{"a": 9, "b": 10}
这些不是:
[
{"a": 1, "b": 2},
{"a": 3, "b": 4},
{"a": 5, "b": 6},
{"a": 7, "b": 8},
{"a": 9, "b": 10}
]
因此您必须一次读取文件,或者在每行一个对象时修改它。
我有一个很大的 json 文件,所以我想在测试时分块读取文件。我已经实现了下面的代码:
if fpath.endswith('.json'):
with open(fpath, 'r') as f:
read_query = pd.read_json(f, lines=True, chunksize=100)
for chunk in read_query:
print(chunk)
我收到错误:
File "nameoffile.py", line 168, in read_queries_func
for chunk in read_query:
File "C:\Users\Me\Python38\lib\site-packages\pandas\io\json\_json.py", line 798, in __next__
obj = self._get_object_parser(lines_json)
File "C:\Users\Me\Python38\lib\site-packages\pandas\io\json\_json.py", line 770, in _get_object_parser
obj = FrameParser(json, **kwargs).parse()
File "C:\Users\Me\Python38\lib\site-packages\pandas\io\json\_json.py", line 885, in parse
self._parse_no_numpy()
File "C:\Users\Me\Python38\lib\site-packages\pandas\io\json\_json.py", line 1159, in _parse_no_numpy
loads(json, precise_float=self.precise_float), dtype=None
ValueError: Expected object or value
为什么我会收到错误消息?
JSON 文件如下所示:
[
{
"a": "13",
"b": "55"
},
{
"a": "15",
"b": "16"
},
{
"a": "18",
"b": "45"
},
{
"a": "1650",
"b": "26"
},
.
.
.
{
"a": "214",
"b": "23"
}
]
此外,有没有办法在读取文件时仅提取 'a' 属性的值?还是只能在我阅读文件后才能完成?
您的 json 文件只包含一个对象。根据 line-delimited json doc to which the doc of the chunksize
argument 点:
pandas is able to read and write line-delimited json files that are common in data processing pipelines using Hadoop or Spark.
For line-delimited json files, pandas can also return an iterator which reads in chunksize lines at a time. This can be useful for large files or to read from a stream.
这也意味着 lines=True
,行的文档说:
Read the file as a json object per line.
这意味着这样的文件有效:
{"a": 1, "b": 2}
{"a": 3, "b": 4}
{"a": 5, "b": 6}
{"a": 7, "b": 8}
{"a": 9, "b": 10}
这些不是:
[
{"a": 1, "b": 2},
{"a": 3, "b": 4},
{"a": 5, "b": 6},
{"a": 7, "b": 8},
{"a": 9, "b": 10}
]
因此您必须一次读取文件,或者在每行一个对象时修改它。