带有重复键的 libpostal 输出字符串(dict),我需要将字符串转换为 Dict

libpostal output string (dict) with duplicate keys and I need to convert string to Dict

我正在使用 libpostal 地址解析库作为 .exe 文件。我有一个脚本来读取终端的输出。输出将是 stringdict 格式,如下所示,

这是地址字符串

"531A UPPER CROSS STREETSINGAPORE HONG LIM COMPLEX 051531 S"

libpostal 终端输出是

'{\n  "house_number": "531a",\n  "road": "upper cross streetsingapore",\n  "city": "hong",\n  "house": "lim complex",\n  "house_number": "051531 s"\n}'

我需要从这个字符串创建一个 Dict,如果有重复的键,则将这些值附加到同一个键中。

预期输出Dict

{
  "house_number": "531a 051531 s",
  "road": "upper cross streetsingapore",
  "city": "hong",
  "house": "lim complex",
}

帮助将不胜感激

您可以使用 json.JSONDecoder 将 dict 文字解码为元组列表,使用 dict.setdefault 将值组合为列表,最后将所有项目加入字典值中:

string = '{\n  "house_number": "531a",\n  "road": "upper cross streetsingapore",\n  "city": "hong",\n  "house": "lim complex",\n  "house_number": "051531 s"\n}'

from json import JSONDecoder
decoder = JSONDecoder(object_pairs_hook=lambda x: x).decode(string)
out = {}
for tpl in decoder:
    out.setdefault(tpl[0],[]).append(tpl[1])
    
out = {k:' '.join(v) for k,v in out.items()}

输出:

{'house_number': '531a 051531 s',
 'road': 'upper cross streetsingapore',
 'city': 'hong',
 'house': 'lim complex'}