Geopandas:如何绘制 countries/cities?

Geopandas: how to plot countries/cities?

我需要在地理图上绘制一些数据。具体来说,我想强调数据来源的国家和州。 我的数据集是

    Year    Country State/City
0   2009    BGR     Sofia
1   2018    BHS     New Providence
2   2002    BLZ     NaN
3   2000    CAN     California
4   2002    CAN     Ontario
... ... ... ...
250 2001    USA     Ohio
251 1998    USA     New York
252 1995    USA     Virginia
253 2011    USA     NaN
254 2019    USA     New York

为了创建地理图,我一直在使用 geopandas 如下:

import geopandas as gpd

shapefile = 'path/ne_110m_admin_0_countries/ne_110m_admin_0_countries.shp'
gdf = gpd.read_file(shapefile)[['ADMIN', 'ADM0_A3', 'geometry']]
gdf.columns = ['country', 'country_code', 'geometry']

然后我合并了两个数据集:

merged = gdf.merge(df, left_on = 'country_code', right_on = 'Country')

并将数据转换为 json:

import json

merged_json = json.loads(merged.to_json())
#Convert to String like object.
json_data = json.dumps(merged_json)

最后,我尝试创建如下图表:

from bokeh.io import output_notebook, show, output_file
from bokeh.plotting import figure
from bokeh.models import GeoJSONDataSource, LinearColorMapper, ColorBar
from bokeh.palettes import brewer

geosource = GeoJSONDataSource(geojson = json_data)

#Define a sequential multi-hue color palette.
palette = brewer['YlGnBu'][8]
palette = palette[::-1]
color_mapper = LinearColorMapper(palette = palette, low = 0, high = 40)

tick_labels = {'0': '0%', '5': '5%', '10':'10%', '15':'15%', '20':'20%', '25':'25%', '30':'30%','35':'35%', '40': '>40%'}

color_bar = ColorBar(color_mapper=color_mapper, label_standoff=8,width = 500, height = 20,
border_line_color=None,location = (0,0), orientation = 'horizontal', major_label_overrides = tick_labels)

p = figure(title = 'Creation year across countries', plot_height = 600 , plot_width = 950, toolbar_location = None)
p.xgrid.grid_line_color = None
p.ygrid.grid_line_color = None

p.patches('xs','ys', source = geosource,fill_color = {'field' :'per_cent_year', 'transform' : color_mapper},
          line_color = 'black', line_width = 0.25, fill_alpha = 1)

p.add_layout(color_bar, 'below')

output_notebook()

#Display figure.
show(p)

当我 运行 它时,它说 BokehJS 1.0.2 successfully loaded。但它不显示任何内容。 我的预期输出是一张地图,其中颜色基于一个国家/地区的出现次数(例如 USA=5 会更暗),另一张地图基于 State/City(纽约会更暗)。

上面的代码有什么问题吗?

(如果需要,很高兴分享更多 data/info)

我假设您是 运行 Jupyter Notebook 上的这个,请尝试将此片段添加到代码块的顶部。

from bokeh.resources import INLINE
import bokeh.io

bokeh.io.output_notebook(INLINE)

或使用您的导入

from bokeh.resources import INLINE
from bokeh.io import output_notebook, show, output_file
from bokeh.plotting import figure
from bokeh.models import GeoJSONDataSource, LinearColorMapper, ColorBar
from bokeh.palettes import brewer

output_notebook(INLINE)

从您发布的代码中我看不出绘图有任何问题,因此我认为问题可能出在您的数据聚合或合并中。

这是一个解决方案,首先生成应该与您的数据类似的数据,然后计算一个国家/地区出现在数据中的次数占数据集大小的比例,因为这是必需的指标。我们将专注于仅使用几个国家作为示例:

from random import choices
import pandas as pd
import numpy as np

def generate_data():
    
    k = 100
    
    countries_of_interest = ['USA','ARG','BRA','GBR','ESP','RUS']
    countries = choices(countries_of_interest, k=k)
    
    start_yr = 2010
    end_yr = 2021
    
    return pd.DataFrame({'Country':countries, 
                         'Year':np.random.randint(start_yr, end_yr, k)},
                        index=range(len(countries)))


def aggregate_data(df):
    data = df.groupby('Country').agg('count')*100.0/len(df)
    data = data.reset_index().rename(columns={'Year':'proportion_of_dataset'})
    return data

df = generate_data()

#    Country  Year
# 0      USA  2017
# 1      GBR  2014
# 2      USA  2013
# 3      BRA  2016
# 4      BRA  2018
# ..     ...   ...
# 95     ESP  2014
# 96     USA  2015
# 97     RUS  2019
# 98     RUS  2012
# 99     RUS  2011
# 
# [100 rows x 2 columns]

data = aggregate_data(df)

#   Country  proportion_of_dataset
# 0     ARG                   20.0
# 1     BRA                   17.0
# 2     ESP                   14.0
# 3     GBR                   14.0
# 4     RUS                   19.0
# 5     USA                   16.0

现在使用 geopandas 加载国家边界 shapefile,并重命名列:

import geopandas as gpd

shapefile = 'path_to_shapfile_folder/ne_110m_admin_0_countries/ne_110m_admin_0_countries.shp'
gdf = gpd.read_file(shapefile)[['ADMIN', 'ADM0_A3', 'geometry']]
gdf.columns = ['country', 'country_code', 'geometry']

gdf.head()

#                        country country_code  \
# 0                         Fiji          FJI   
# 1  United Republic of Tanzania          TZA   
# 2               Western Sahara          SAH   
# 3                       Canada          CAN   
# 4     United States of America          USA   
# 
#                                             geometry  
# 0  MULTIPOLYGON (((180.00000 -16.06713, 180.00000...  
# 1  POLYGON ((33.90371 -0.95000, 34.07262 -1.05982...  
# 2  POLYGON ((-8.66559 27.65643, -8.66512 27.58948...  
# 3  MULTIPOLYGON (((-122.84000 49.00000, -122.9742...  
# 4  MULTIPOLYGON (((-122.84000 49.00000, -120.0000...

现在我们要将国家多边形数据框与我们的聚合数据合并。注意:我们想做一个左连接(在完整的国家多边形数据框上),以便我们包括所有国家,甚至是我们没有数据的国家。另请注意,我们通过用零填充 NaN 来为这些国家/地区添加缺失值:

merged = gdf.merge(data, left_on = 'country_code', right_on = 'Country', how='left')
merged['proportion_of_dataset'] = merged['proportion_of_dataset'].fillna(0)

使用您的代码创建 geojson:

import json

merged_json = json.loads(merged.to_json())
json_data = json.dumps(merged_json)

最后,我们会将您的绘图代码放入一个函数中,并将 geojson、要绘制的列和绘图标题作为参数传入:

from bokeh.io import output_notebook, show, output_file
from bokeh.plotting import figure
from bokeh.models import GeoJSONDataSource, LinearColorMapper, ColorBar
from bokeh.palettes import brewer

def plot_map(json_data,plot_col,title):

    geosource = GeoJSONDataSource(geojson = json_data)

    #Define a sequential multi-hue color palette.
    palette = brewer['YlGnBu'][8]
    palette = palette[::-1]
    color_mapper = LinearColorMapper(palette = palette, low = 0, high = 40)

    tick_labels = {'0': '0%', '5': '5%', '10':'10%', '15':'15%', '20':'20%', '25':'25%', '30':'30%','35':'35%', '40': '>40%'}

    color_bar = ColorBar(color_mapper=color_mapper, label_standoff=8,width = 500, height = 20,
    border_line_color=None,location = (0,0), orientation = 'horizontal', major_label_overrides = tick_labels)

    p = figure(title = title, plot_height = 600 , plot_width = 950, toolbar_location = None)
    p.xgrid.grid_line_color = None
    p.ygrid.grid_line_color = None

    p.patches('xs','ys', source = geosource,fill_color = {'field' :plot_col, 'transform' : color_mapper},
              line_color = 'black', line_width = 0.25, fill_alpha = 1)

    p.add_layout(color_bar, 'below')

    output_notebook()

    #Display figure.
    show(p)

现在我们要做的就是调用绘图函数,传入所需的参数:

plot_map(json_data,'proportion_of_dataset','Dataset countries of origin')