递归删除 python 中 json 对象列表中的 null/empty 值
removing null/empty values in lists of a json object in python recursively
我有一个 json 对象(json 字符串),它的值如下:
[
{
"id": 1,
"object_k_id": "",
"object_type": "report",
"object_meta": {
"source_id": 0,
"report": "Customers"
},
"description": "Daily metrics for all customers",
"business_name": "",
"business_logic": "",
"owners": [
"nn@abc.com",
null
],
"stewards": [
"nn@abc.com",
''
],
"verified_use_cases": [
null,
null,
"c4a48296-fd92-3606-bf84-99aacdf22a20",
null
],
"classifications": [
null
],
"domains": []
}
]
但我想要的最终格式是删除了空值和空列表项的格式:像这样的格式:
[
{
"id": 1,
"object_k_id": "",
"object_type": "report",
"object_meta": {
"source_id": 0,
"report": "Customers"
},
"description": "Daily metrics for all customers",
"business_name": "",
"business_logic": "",
"owners": [
"nn@abc.com"
],
"stewards": [
"nn@abc.com"
],
"verified_use_cases": [
"c4a48296-fd92-3606-bf84-99aacdf22a20"
],
"classifications": [],
"domains": []
}
]
我希望输出排除空值、空字符串并使其看起来更干净。
我需要对我拥有的所有 json 中的所有列表递归执行此操作。
比递归更重要的是,如果我可以一次完成而不是遍历每个元素,那将会很有帮助。
虽然我只需要清理列表。
谁能帮我解决这个问题?提前致谢
您可以将 json
转换为 dict
,然后使用下面的 function
再次将其转换为 json
:
def clean_dict(input_dict):
output = {}
for key, value in input_dict.items():
if isinstance(value, dict):
output[key] = clean_dict(value)
elif isinstance(value, list):
output[key] = []
for item in value:
if isinstance(value, dict):
output[key].append(clean_dict(item))
elif value not in [None, '']:
output[key].append(item)
else:
output[key] = value
return output
感谢N.O
import json
def recursive_dict_clean(d):
for k, v in d.items():
if isinstance(v, list):
v[:] = [i for i in v if i]
if isinstance(v, dict):
recursive_dict_lookup(v)
data = json.loads("""[{
"id": 1,
"object_k_id": "",
"object_type": "report",
"object_meta": {
"source_id": 0,
"report": "Customers"
},
"description": "Daily metrics for all customers",
"business_name": "",
"business_logic": "",
"owners": [
"nn@abc.com",
null
],
"stewards": [
"nn@abc.com"
],
"verified_use_cases": [
null,
null,
"c4a48296-fd92-3606-bf84-99aacdf22a20",
null
],
"classifications": [
null
],
"domains": []
}]""")
for d in data:
recursive_dict_clean(d)
print(data):
[{'id': 1,
'object_k_id': '',
'object_type': 'report',
'object_meta': {'source_id': 0, 'report': 'Customers'},
'description': 'Daily metrics for all customers',
'business_name': '',
'business_logic': '',
'owners': ['nn@abc.com'],
'stewards': ['nn@abc.com'],
'verified_use_cases': ['c4a48296-fd92-3606-bf84-99aacdf22a20'],
'classifications': [],
'domains': []}]
P.S.: 您的 json 字符串无效。
您可以使用内置的 object_pairs_hook
在从字符串解码数据时解析数据。
https://docs.python.org/3/library/json.html#json.load
每当解码器可能调用 dict()
并使用简单的列表理解从列表中删除所有 None
对象时,此函数就会运行,否则将数据单独保留并让解码器执行它东西。
#!/usr/bin/env python3
import json
data_string = """[
{
"id": 1,
"object_k_id": "",
"object_type": "report",
"object_meta": {
"source_id": 0,
"report": "Customers"
},
"description": "Daily metrics for all customers",
"business_name": "",
"business_logic": "",
"owners": [
"nn@abc.com",
null
],
"stewards": [
"nn@abc.com",
""
],
"verified_use_cases": [
null,
null,
"c4a48296-fd92-3606-bf84-99aacdf22a20",
null
],
"classifications": [
null
],
"domains": []
}
]"""
def json_hook(obj):
return_obj = {}
for k, v in obj:
if isinstance(v, list):
v = [x for x in v if x is not None]
return_obj[k] = v
return return_obj
data = json.loads(data_string, object_pairs_hook=json_hook)
print(json.dumps(data, indent=4))
结果:
[
{
"id": 1,
"object_k_id": "",
"object_type": "report",
"object_meta": {
"source_id": 0,
"report": "Customers"
},
"description": "Daily metrics for all customers",
"business_name": "",
"business_logic": "",
"owners": [
"nn@abc.com"
],
"stewards": [
"nn@abc.com",
""
],
"verified_use_cases": [
"c4a48296-fd92-3606-bf84-99aacdf22a20"
],
"classifications": [],
"domains": []
}
]
在您的示例中,您从 stewards
中删除了 ""
值,如果您想要这种行为,您可以将 is not None
替换为 not in (None, "")
.. 但它看起来像这可能是一个错误,因为你在其他地方留下了空字符串。
我有一个 json 对象(json 字符串),它的值如下:
[
{
"id": 1,
"object_k_id": "",
"object_type": "report",
"object_meta": {
"source_id": 0,
"report": "Customers"
},
"description": "Daily metrics for all customers",
"business_name": "",
"business_logic": "",
"owners": [
"nn@abc.com",
null
],
"stewards": [
"nn@abc.com",
''
],
"verified_use_cases": [
null,
null,
"c4a48296-fd92-3606-bf84-99aacdf22a20",
null
],
"classifications": [
null
],
"domains": []
}
]
但我想要的最终格式是删除了空值和空列表项的格式:像这样的格式:
[
{
"id": 1,
"object_k_id": "",
"object_type": "report",
"object_meta": {
"source_id": 0,
"report": "Customers"
},
"description": "Daily metrics for all customers",
"business_name": "",
"business_logic": "",
"owners": [
"nn@abc.com"
],
"stewards": [
"nn@abc.com"
],
"verified_use_cases": [
"c4a48296-fd92-3606-bf84-99aacdf22a20"
],
"classifications": [],
"domains": []
}
]
我希望输出排除空值、空字符串并使其看起来更干净。 我需要对我拥有的所有 json 中的所有列表递归执行此操作。
比递归更重要的是,如果我可以一次完成而不是遍历每个元素,那将会很有帮助。
虽然我只需要清理列表。
谁能帮我解决这个问题?提前致谢
您可以将 json
转换为 dict
,然后使用下面的 function
再次将其转换为 json
:
def clean_dict(input_dict):
output = {}
for key, value in input_dict.items():
if isinstance(value, dict):
output[key] = clean_dict(value)
elif isinstance(value, list):
output[key] = []
for item in value:
if isinstance(value, dict):
output[key].append(clean_dict(item))
elif value not in [None, '']:
output[key].append(item)
else:
output[key] = value
return output
感谢N.O
import json
def recursive_dict_clean(d):
for k, v in d.items():
if isinstance(v, list):
v[:] = [i for i in v if i]
if isinstance(v, dict):
recursive_dict_lookup(v)
data = json.loads("""[{
"id": 1,
"object_k_id": "",
"object_type": "report",
"object_meta": {
"source_id": 0,
"report": "Customers"
},
"description": "Daily metrics for all customers",
"business_name": "",
"business_logic": "",
"owners": [
"nn@abc.com",
null
],
"stewards": [
"nn@abc.com"
],
"verified_use_cases": [
null,
null,
"c4a48296-fd92-3606-bf84-99aacdf22a20",
null
],
"classifications": [
null
],
"domains": []
}]""")
for d in data:
recursive_dict_clean(d)
print(data):
[{'id': 1,
'object_k_id': '',
'object_type': 'report',
'object_meta': {'source_id': 0, 'report': 'Customers'},
'description': 'Daily metrics for all customers',
'business_name': '',
'business_logic': '',
'owners': ['nn@abc.com'],
'stewards': ['nn@abc.com'],
'verified_use_cases': ['c4a48296-fd92-3606-bf84-99aacdf22a20'],
'classifications': [],
'domains': []}]
P.S.: 您的 json 字符串无效。
您可以使用内置的 object_pairs_hook
在从字符串解码数据时解析数据。
https://docs.python.org/3/library/json.html#json.load
每当解码器可能调用 dict()
并使用简单的列表理解从列表中删除所有 None
对象时,此函数就会运行,否则将数据单独保留并让解码器执行它东西。
#!/usr/bin/env python3
import json
data_string = """[
{
"id": 1,
"object_k_id": "",
"object_type": "report",
"object_meta": {
"source_id": 0,
"report": "Customers"
},
"description": "Daily metrics for all customers",
"business_name": "",
"business_logic": "",
"owners": [
"nn@abc.com",
null
],
"stewards": [
"nn@abc.com",
""
],
"verified_use_cases": [
null,
null,
"c4a48296-fd92-3606-bf84-99aacdf22a20",
null
],
"classifications": [
null
],
"domains": []
}
]"""
def json_hook(obj):
return_obj = {}
for k, v in obj:
if isinstance(v, list):
v = [x for x in v if x is not None]
return_obj[k] = v
return return_obj
data = json.loads(data_string, object_pairs_hook=json_hook)
print(json.dumps(data, indent=4))
结果:
[
{
"id": 1,
"object_k_id": "",
"object_type": "report",
"object_meta": {
"source_id": 0,
"report": "Customers"
},
"description": "Daily metrics for all customers",
"business_name": "",
"business_logic": "",
"owners": [
"nn@abc.com"
],
"stewards": [
"nn@abc.com",
""
],
"verified_use_cases": [
"c4a48296-fd92-3606-bf84-99aacdf22a20"
],
"classifications": [],
"domains": []
}
]
在您的示例中,您从 stewards
中删除了 ""
值,如果您想要这种行为,您可以将 is not None
替换为 not in (None, "")
.. 但它看起来像这可能是一个错误,因为你在其他地方留下了空字符串。