在 JSON 个嵌套对象中搜索组合

Search for combinations in JSON nested object

我有一个大 JSON 对象。其中一段是:

data = [
{  
   'make': 'dacia',
   'model': 'x',
   'version': 'A',
   'typ': 'sedan',
   'infos': [
            {'id': 1, 'name': 'steering wheel problems'}, 
            {'id': 32, 'name': ABS errors}
   ]
},
{  
   'make': 'nissan',
   'model': 'z',
   'version': 'B',
   'typ': 'coupe',
   'infos': [
         {'id': 3,'name': throttle problems'}, 
         {'id': 56, 'name': 'broken handbreak'}, 
         {'id': 11, ;'name': missing seatbelts'}
   ]
}
]

我创建了一个列表,列出了我的 JSON 中可能出现的所有可能的信息组合(一辆车有时只能有一个信息,而另一辆车可能有很多):

inf = list(set(i.get'name' for d in data for i in (d['infos'] if isinstance(d['infos'], list) else [d['infos']]))
inf_comb = [combo for n in range(1, len(infos+1)) for combo in itertools.combinations(infos, n)]
infos_combo = [list(elem) for elem in inf_comb]

现在我需要遍历整个 JSON data 并计算特定集合 infos_combo 发生了多少次,所以我创建了代码:

tab = []
s = 0
for x in infos_combo:
   s = sum([1 for k in data if (([i['name'] for i in (k['infos'] if isinstance(k['infos'], list) else [k['infos']])] == x))])
   if s!= 0:
     tab.append({'infos': r, 'sum': s})
print(tab)

我面临的问题是 tab returns 只有我期望的一些元素 - 我的 JSON 对象中出现了更多的组合,必须计算在内但我无法得到它们。如何解决?

好的,所以首先您需要从 json 数据中获取所有实际的“信息”,如下所示:

infos = [
    [i["name"] for i in d["infos"]] if isinstance(d["infos"], list) else d["infos"]
    for d in data
]

这将为您提供类似下面的内容,我们稍后会用到它:

[['steering wheel problems', 'ABS errors'], ['throttle problems', 'broken handbreak', 'missing seatbelts']]

现在,要获得所有组合,我们首先需要通过展平信息数组并剔除重复项来处理它:

unique_infos = [x for l in infos for x in l]

获取所有组合:

infos_combo = itertools.chain.from_iterable(
    itertools.combinations(unique_infos, r) for r in range(len(unique_infos) + 1)
)

这将产生:

()
('steering wheel problems',)
('ABS errors',)
('throttle problems',)
('broken handbreak',)
('missing seatbelts',)
('steering wheel problems', 'ABS errors')
('steering wheel problems', 'throttle problems')
('steering wheel problems', 'broken handbreak')
...
# truncated code too long
...
('steering wheel problems', 'throttle problems', 'broken handbreak', 'missing seatbelts')
('ABS errors', 'throttle problems', 'broken handbreak', 'missing seatbelts')
('steering wheel problems', 'ABS errors', 'throttle problems', 'broken handbreak', 'missing seatbelts')

之后,就是对我们从原始信息列表中获得的每个组合进行计数:

occurences = {}
for combo in infos_combo:
    occurences[combo] = infos.count(list(combo))

print(occurences)

完整代码:

import itertools
import sys

data = [
    {
        "make": "dacia",
        "model": "x",
        "version": "A",
        "typ": "sedan",
        "infos": [
            {"id": 1, "name": "steering wheel problems"},
            {"id": 32, "name": "ABS errors"},
        ],
    },
    {
        "make": "nissan",
        "model": "z",
        "version": "B",
        "typ": "coupe",
        "infos": [
            {"id": 3, "name": "throttle problems"},
            {"id": 56, "name": "broken handbreak"},
            {"id": 11, "name": "missing seatbelts"},
        ],
    },
]

infos = [
    [i["name"] for i in d["infos"]] if isinstance(d["infos"], list) else d["infos"]
    for d in data
]

unique_infos = [x for l in infos for x in l]

infos_combo = itertools.chain.from_iterable(
    itertools.combinations(unique_infos, r) for r in range(len(unique_infos) + 1)
)

occurences = {}
for combo in infos_combo:
    occurences[combo] = infos.count(list(combo))

print(occurences)