json 文件的正确格式...然后是数据框

Question

我有一个记事本文件，我将其另存为 json 文件，我正尝试在 pandas 数据帧中读取它。

我的 json 文件如下所示：

{
  "date" : "2000-01-01",
  "i" : "1387",
  "xxx" : "aaaa",
}, 
{
  "fecha" : "2000-01-02",
  "indicativo" : "1387",
  "xxx" : "aaaa",
}, 
{
  "data" : "2000-01-03",
  "indicativo" : "1387",
}, 
{
  "date" : "2000-01-04",
  "i" : "1387",
  "xxx" : "aaaa",
}, 
{
  "fecha" : "2000-01-05",
  "indicativo" : "1387",
  "xxx" : "aaaa",
}

如何使用代码将其更改为正确的 json 格式？（请记住，我只是发布了一些行，实际的 json 文件有数百行，所以我手动执行是不切实际的）

然后一旦我有了那个文件，代码将是：

import pandas as pd
from pandas.io.json import json_normalize
name = pd.read_json(r"file.json", lines=True, orient='records')

我用 json 文件尝试了运行上面的代码，但一直得到：

ValueError: Expected object or value.

经过反复试验，我认为这是因为它的格式不正确 json，所以如果有人至少在第一部分帮助我，我将不胜感激。

Answer 1

我认为您的 json 文件的开头和结尾应该有 []。

Answer 2

问题地址 如何使用代码将其更改为正确的 json 格式？
鉴于文件中显示的内容为逗号行和 \n 分隔的字典。
通过在文件开头添加 [ 并在文件末尾添加 ] 来读取并修复文件。
- 文件修复后，无需再次修复。
用 pandas.read_json 读回文件
- 词典列表可以加载到pandas，但每个dict中有不同的keys，因此可能需要一些额外的清理。

import json
import pandas as pd
from pathlib import Path

# path to file
p = Path('e:/PythonProjects/stack_overflow/test.json')

# read and fix the file
with p.open('r+') as f:
    file = f.read()  # reads the file in as a long string
    file = '[' + file + ']'  # add characters to beginning and end of string
    f.seek(0)  # find the beginning of the file
    f.write(file)  # write the new data back to the file
    f.truncate()  # remove the old data

# after fixing the file with code 
df = pd.read_json(p)

# display(df)
         date     i   xxx       fecha indicativo        data
0  2000-01-01  1387  aaaa         NaN        NaN         NaN
1         NaN   NaN  aaaa  2000-01-02       1387         NaN
2         NaN   NaN   NaN         NaN       1387  2000-01-03
3  2000-01-04  1387  aaaa         NaN        NaN         NaN
4         NaN   NaN  aaaa  2000-01-05       1387         NaN

Answer 3

我认为您的数据文件是字典列表，但缺少左右方括号。（该文件不是 JSON，因为有字典（值），但没有键）。

上面的响应显示了如何添加“[”和“]”。

完成后，可以直接调用DataFrame的构造函数：

data = [
    {
      "date" : "2000-01-01",
      "i" : "1387",
      "xxx" : "aaaa",
    }, 
    {
      "fecha" : "2000-01-02",
      "indicativo" : "1387",
      "xxx" : "aaaa",
    }, 
    # remaining dictionaries, omitted, to save space
]

pd.DataFrame(data)

json 文件的正确格式...然后是数据框

correct format for json file...then to dataframe

json

notepad

dataframe

pandas

jupyter-notebook