通过查找值是否在列表中来删除整个嵌套字典

Question

我正在尝试制作美国邮政编码范围内值的等值线图。我有一个 json 文件，其中包含与每个邮政编码对应的 ZCTA5CE 区域的点。我正在使用 Folium 包。

现在地图可以工作了，但是速度慢得令人痛苦——根据我机器上的其他项目运行需要 10 分钟，并且几乎不可能通过滑动和缩放与地图进行交互——因为json 文件的大小 (482.2M) 以及生成的字典。

我要绘制的数据没有所有邮政编码的信息，因此我想删除与那些邮政编码相关联的邮政编码字典中的信息不在我的数据中。

我的问题是：如何遍历邮政编码信息字典并删除不在我指定的邮政编码列表中的字典。

为了更清楚地了解我正在使用的字典的结构：

zip_code_geo_dict.keys() 给出：

dict_keys(['type', 'features'])

其中 zip_code_geo_dict['type'] 是一个字符串，zip_code_geo_dict['features'] 是一个列表。

现在，zip_code_geo_dict['features'][0] 是：

{'type': 'Feature','geometry': {'type': 'MultiPolygon',
'coordinates': [[[[-88.252618, 32.92675],
[-88.249724, 32.93242],
**bajillions of lines of coordinates here**
[-88.34043199999999, 32.991199]]]]},
'properties': {'ZCTA5CE10': '35442',
'AFFGEOID10': '8600000US35442',
'GEOID10': '35442',
'ALAND10': 610213891,
'AWATER10': 10838694}}

我的源数据可以更改，因此我要映射的实际邮政编码列表是动态的。也就是说，我总是可以创建一个列表：

zips_of_interest = ['15210', '15222']

如何遍历zip_code_geo_dict来移除基于zip_code_geo_dict['features']['properties']['ZCTA5CE10'] NOT IN zips_of_interest的坐标信息？有必要保持 over-arching dict 结构，这样过滤后的版本 zip_code_geo_dict['features'] 与原始版本 "spot" 相同（它需要是一个 dict 作为较大 [= 中的第二个键17=] 对象）。

我认为需要注意的是，我想保留字典的基本结构，因为我将把它传递给 Folium 中的 choropleth 方法。

Answer 1

如果邮政编码信息可能会改变，我的第一个建议是使用 RTree、KDTree 或 BallTree 之类的东西将信息存储在一个结构中，以便按区域轻松访问。这些使您能够有效地进行查询，例如 "what are all the zip codes within r radius of my zip codes of interest?".

就实际实现过滤器而言，如果您有很多邮政编码，您可能想要执行类似 lookup = set(zips_of_interest) 的操作，以便您可以在 O(1) 而不是 O 中搜索包含(n).对于 len(zips_of_interest)<15 左右，列表可能没问题（很大程度上取决于您的平台）。

你提到zip_code_geo_dict的"first item"是[某某]。 zip_code_geo_dict 的类型是什么？它是字典吗？从中过滤掉东西取决于它到底是什么。也就是说，对于常见的数据结构，您基本上已经编写了命令。

lookup = set(zip_code_geo_dict)

词典

condensed_data = {k:zip_code_geo_dict[k] for k in zip_code_geo_dict
                  if zip_code_geo_dict[k]['properties']['ZCTA5CE10'] in lookup}

列表

condensed_data = [v for v in zip_code_geo_dict
                  if v['properties']['ZCTA5CE10'] in lookup]

在这两种情况下，您基本上都是在告诉 Python 为您提供原始数据结构中所有感兴趣的位置。

Answer 2

不确定这是否是您要查找的内容。您发布的字典没有功能键。我编写了一个不会被您提出的逻辑删除的附加字典，并将两个字典放在一个列表中以提供完整的演示。

def filter_zips(geo_list, zip_list):
    result = geo_list.copy()
    for i, zip_code_geo_dict in enumerate(result):
        if zip_code_geo_dict['properties']['ZCTA5CE10'] not in zip_list:
            del result[i]
    return result

zip_code_geo_list = [
    {
        'type': 'Feature',
        'geometry': {
            'type': 'MultiPolygon',
            'coordinates': [
                [-88.252618, 32.92675],
                [-88.249724, 32.93242],
                [-88.34043199999999, 32.991199]
            ]
        },
        'properties': {
            'ZCTA5CE10': '35442',
            'AFFGEOID10': '8600000US35442',
            'GEOID10': '35442',
            'ALAND10': 610213891,
            'AWATER10': 10838694
        }
    },
    {
        'type': 'Feature',
        'geometry': {
            'type': 'MultiPolygon',
            'coordinates': [
                [-88.252618, 32.92675],
                [-88.249724, 32.93242],
                [-88.34043199999999, 32.991199]
            ]
        },
        'properties': {
            'ZCTA5CE10': '35442',
            'AFFGEOID10': '8600000US35442',
            'GEOID10': '15210',
            'ALAND10': 610213891,
            'AWATER10': 10838694
        }
    },
]
zips_of_interest = ['15210', '15222']

filter_zips(zip_code_geo_list, zips_of_interest)

filter_zips() 在这种情况下将 return 删除第一个字典并保留第二个字典的列表。

通过查找值是否在列表中来删除整个嵌套字典

Remove whole nested dict by finding if a value is in a list

python

geo

python-3.x

folium