递归删除 python 中 json 对象列表中的 null/empty 值

removing null/empty values in lists of a json object in python recursively

我有一个 json 对象(json 字符串),它的值如下:

[
   {
      "id": 1,
      "object_k_id": "",
      "object_type": "report",
      "object_meta": {
         "source_id": 0,
         "report": "Customers"
      },
      "description": "Daily metrics for all customers",
      "business_name": "",
      "business_logic": "",
      "owners": [
         "nn@abc.com",
          null
      ],
      "stewards": [
         "nn@abc.com",
         ''
      ],
      "verified_use_cases": [
         null,
         null,
         "c4a48296-fd92-3606-bf84-99aacdf22a20",
         null
      ],
      "classifications": [
         null
      ],
      "domains": []
   }
]

但我想要的最终格式是删除了空值和空列表项的格式:像这样的格式:

[
   {
      "id": 1,
      "object_k_id": "",
      "object_type": "report",
      "object_meta": {
         "source_id": 0,
         "report": "Customers"
      },
      "description": "Daily metrics for all customers",
      "business_name": "",
      "business_logic": "",
      "owners": [
         "nn@abc.com"
      ],
      "stewards": [
         "nn@abc.com"
      ],
      "verified_use_cases": [
         "c4a48296-fd92-3606-bf84-99aacdf22a20"
      ],
      "classifications": [],
      "domains": []
   }
]

我希望输出排除空值、空字符串并使其看起来更干净。 我需要对我拥有的所有 json 中的所有列表递归执行此操作。

比递归更重要的是,如果我可以一次完成而不是遍历每个元素,那将会很有帮助。

虽然我只需要清理列表。

谁能帮我解决这个问题?提前致谢

您可以将 json 转换为 dict,然后使用下面的 function 再次将其转换为 json

def clean_dict(input_dict):
    output = {}
    for key, value in input_dict.items():
        if isinstance(value, dict):
            output[key] = clean_dict(value)
        elif isinstance(value, list):
            output[key] = []
            for item in value:
                if isinstance(value, dict):
                    output[key].append(clean_dict(item))
                elif value not in [None, '']:
                    output[key].append(item)
        else:
            output[key] = value
    return output

感谢N.O

import json


def recursive_dict_clean(d):
    for k, v in d.items():
        if isinstance(v, list):
            v[:] = [i for i in v if i]
        if isinstance(v, dict):
            recursive_dict_lookup(v)


data = json.loads("""[{
    "id": 1,
    "object_k_id": "",
    "object_type": "report",
    "object_meta": {
        "source_id": 0,
        "report": "Customers"
    },
    "description": "Daily metrics for all customers",
    "business_name": "",
    "business_logic": "",
    "owners": [
        "nn@abc.com",
        null
    ],
    "stewards": [
        "nn@abc.com"
    ],
    "verified_use_cases": [
        null,
        null,
        "c4a48296-fd92-3606-bf84-99aacdf22a20",
        null
    ],
    "classifications": [
        null
    ],
    "domains": []
}]""")


for d in data:
    recursive_dict_clean(d)

print(data):
[{'id': 1,
  'object_k_id': '',
  'object_type': 'report',
  'object_meta': {'source_id': 0, 'report': 'Customers'},
  'description': 'Daily metrics for all customers',
  'business_name': '',
  'business_logic': '',
  'owners': ['nn@abc.com'],
  'stewards': ['nn@abc.com'],
  'verified_use_cases': ['c4a48296-fd92-3606-bf84-99aacdf22a20'],
  'classifications': [],
  'domains': []}]

P.S.: 您的 json 字符串无效。

您可以使用内置的 object_pairs_hook 在从字符串解码数据时解析数据。

https://docs.python.org/3/library/json.html#json.load

每当解码器可能调用 dict() 并使用简单的列表理解从列表中删除所有 None 对象时,此函数就会运行,否则将数据单独保留并让解码器执行它东西。

#!/usr/bin/env python3
import json
data_string = """[
   {
      "id": 1,
      "object_k_id": "",
      "object_type": "report",
      "object_meta": {
         "source_id": 0,
         "report": "Customers"
      },
      "description": "Daily metrics for all customers",
      "business_name": "",
      "business_logic": "",
      "owners": [
         "nn@abc.com",
          null
      ],
      "stewards": [
         "nn@abc.com",
         ""
      ],
      "verified_use_cases": [
         null,
         null,
         "c4a48296-fd92-3606-bf84-99aacdf22a20",
         null
      ],
      "classifications": [
         null
      ],
      "domains": []
   }
]"""

def json_hook(obj):
    return_obj = {}
    for k, v in obj:
        if isinstance(v, list):
            v = [x for x in v if x is not None]

        return_obj[k] = v

    return return_obj

data = json.loads(data_string, object_pairs_hook=json_hook)

print(json.dumps(data, indent=4))

结果:

[
    {
        "id": 1,
        "object_k_id": "",
        "object_type": "report",
        "object_meta": {
            "source_id": 0,
            "report": "Customers"
        },
        "description": "Daily metrics for all customers",
        "business_name": "",
        "business_logic": "",
        "owners": [
            "nn@abc.com"
        ],
        "stewards": [
            "nn@abc.com",
            ""
        ],
        "verified_use_cases": [
            "c4a48296-fd92-3606-bf84-99aacdf22a20"
        ],
        "classifications": [],
        "domains": []
    }
]

在您的示例中,您从 stewards 中删除了 "" 值,如果您想要这种行为,您可以将 is not None 替换为 not in (None, "").. 但它看起来像这可能是一个错误,因为你在其他地方留下了空字符串。