如何使用 Python 将列表中的嵌套 json 存储到文本文件?

How to store the nested json which is in a list to a text file using Python?

我正在创建一个嵌套的 json 并将其存储在一个列表对象中。这是我的代码,它按预期获得了正确的层次结构 json。

示例数据:

数据源,datasource_cnt,类别,category_cnt,子类别,子category_cnt 劳工统计局,44,就业和工资,44,就业和工资,44

import pandas as pd
df=pd.read_csv('queryhive16273.csv')
def split_df(df):
   for (vendor, count), df_vendor in df.groupby(["datasource", "datasource_cnt"]):
       yield {
           "vendor_name": vendor,
           "count": count,
           "categories": list(split_category(df_vendor))
       }

def split_category(df_vendor):
   for (category, count), df_category in df_vendor.groupby(
       ["category", "category_cnt"]
   ):
       yield {
           "name": category,
           "count": count,
           "subCategories": list(split_subcategory(df_category)),
       }

def split_subcategory(df_category):
   for (subcategory, count), df_subcategory in df_category.groupby(
       ["subcategory", "subcategory_cnt"]
   ):
       yield {
           "count": count,
           "name": subcategory,
             }


abc=list(split_df(df))

abc 包含如下所示的数据。这是预期的结果。

[{
    'count': 44,
    'vendor_name': 'Bureau of Labor Statistics',
    'categories': [{
        'count': 44,
        'name': 'Employment and wages',
        'subCategories': [{
            'count': 44,
            'name': 'Employment and wages'
        }]
    }]
}]

现在我试图将其存储到 json 文件中。

with open('your_file2.json', 'w') as f:
    for item in abc:
       f.write("%s\n" % item)
        #f.write(abc)

问题来了。这以这种方式写入数据(请参阅下文),这不是有效的 json 格式。如果我尝试使用 json 转储,它会给出 "json serialize error"

你能帮帮我吗?

{
    'count': 44,
    'vendor_name': 'Bureau of Labor Statistics',
    'categories': [{
        'count': 44,
        'name': 'Employment and wages',
        'subCategories': [{
            'count': 44,
            'name': 'Employment and wages'
        }]
    }]
}

预期结果:

[{
    "count": 44,
    "vendor_name": "Bureau of Labor Statistics",
    "categories": [{
        "count": 44,
        "name": "Employment and wages",
        "subCategories": [{
            "count": 44,
            "name": "Employment and wages"
        }]
    }]
}]

使用您的数据和 PSL json 给我:

TypeError: Object of type 'int64' is not JSON serializable

这只是意味着一些 numpy 对象存在于您的嵌套结构中,并且没有 encode 方法将其转换为 JSON 序列化。

当对象本身缺少字符串转换时强制编码使用字符串转换足以使您的代码正常工作:

import io
d = io.StringIO("datasource,datasource_cnt,category,category_cnt,subcategory,subcategory_cnt\nBureau of Labor Statistics,44,Employment and wages,44,Employment and wages,44")
df=pd.read_csv(d)

abc=list(split_df(df))

import json
json.dumps(abc, default=str)

它returns一个有效的JSON(但是int转换成str):

'[{"vendor_name": "Bureau of Labor Statistics", "count": "44", "categories": [{"name": "Employment and wages", "count": "44", "subCategories": [{"count": "44", "name": "Employment and wages"}]}]}]'

如果不适合你的需要,那就用专用的Encoder:

import numpy as np
class MyEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, np.int64):
            return int(obj)
        return json.JSONEncoder.default(self, obj)

json.dumps(abc, cls=MyEncoder)

这个returns请求的JSON:

'[{"vendor_name": "Bureau of Labor Statistics", "count": 44, "categories": [{"name": "Employment and wages", "count": 44, "subCategories": [{"count": 44, "name": "Employment and wages"}]}]}]'

另一种选择是在编码之前直接转换您的数据:

def split_category(df_vendor):
   for (category, count), df_category in df_vendor.groupby(
       ["category", "category_cnt"]
   ):
       yield {
           "name": category,
           "count": int(count), # Cast here before encoding
           "subCategories": list(split_subcategory(df_category)),
       }
import json

data = [{
    'count': 44,
    'vendor_name': 'Bureau of Labor Statistics',
    'categories': [{
        'count': 44,
        'name': 'Employment and wages',
        'subCategories': [{
            'count': 44,
            'name': 'Employment and wages'
        }]
    }]
}]

with open('your_file2.json', 'w') as f:
    json.dump(data, f, indent=2)

生成有效的 JSON 文件:

[
  {
    "count": 44,
    "vendor_name": "Bureau of Labor Statistics",
    "categories": [
      {
        "count": 44,
        "name": "Employment and wages",
        "subCategories": [
          {
            "count": 44,
            "name": "Employment and wages"
          }
        ]
      }
    ]
  }
]