Geopandas:如何绘制 countries/cities?
Geopandas: how to plot countries/cities?
我需要在地理图上绘制一些数据。具体来说,我想强调数据来源的国家和州。
我的数据集是
Year Country State/City
0 2009 BGR Sofia
1 2018 BHS New Providence
2 2002 BLZ NaN
3 2000 CAN California
4 2002 CAN Ontario
... ... ... ...
250 2001 USA Ohio
251 1998 USA New York
252 1995 USA Virginia
253 2011 USA NaN
254 2019 USA New York
为了创建地理图,我一直在使用 geopandas
如下:
import geopandas as gpd
shapefile = 'path/ne_110m_admin_0_countries/ne_110m_admin_0_countries.shp'
gdf = gpd.read_file(shapefile)[['ADMIN', 'ADM0_A3', 'geometry']]
gdf.columns = ['country', 'country_code', 'geometry']
然后我合并了两个数据集:
merged = gdf.merge(df, left_on = 'country_code', right_on = 'Country')
并将数据转换为 json:
import json
merged_json = json.loads(merged.to_json())
#Convert to String like object.
json_data = json.dumps(merged_json)
最后,我尝试创建如下图表:
from bokeh.io import output_notebook, show, output_file
from bokeh.plotting import figure
from bokeh.models import GeoJSONDataSource, LinearColorMapper, ColorBar
from bokeh.palettes import brewer
geosource = GeoJSONDataSource(geojson = json_data)
#Define a sequential multi-hue color palette.
palette = brewer['YlGnBu'][8]
palette = palette[::-1]
color_mapper = LinearColorMapper(palette = palette, low = 0, high = 40)
tick_labels = {'0': '0%', '5': '5%', '10':'10%', '15':'15%', '20':'20%', '25':'25%', '30':'30%','35':'35%', '40': '>40%'}
color_bar = ColorBar(color_mapper=color_mapper, label_standoff=8,width = 500, height = 20,
border_line_color=None,location = (0,0), orientation = 'horizontal', major_label_overrides = tick_labels)
p = figure(title = 'Creation year across countries', plot_height = 600 , plot_width = 950, toolbar_location = None)
p.xgrid.grid_line_color = None
p.ygrid.grid_line_color = None
p.patches('xs','ys', source = geosource,fill_color = {'field' :'per_cent_year', 'transform' : color_mapper},
line_color = 'black', line_width = 0.25, fill_alpha = 1)
p.add_layout(color_bar, 'below')
output_notebook()
#Display figure.
show(p)
当我 运行 它时,它说 BokehJS 1.0.2 successfully loaded
。但它不显示任何内容。
我的预期输出是一张地图,其中颜色基于一个国家/地区的出现次数(例如 USA=5 会更暗),另一张地图基于 State/City(纽约会更暗)。
上面的代码有什么问题吗?
(如果需要,很高兴分享更多 data/info)
我假设您是 运行 Jupyter Notebook 上的这个,请尝试将此片段添加到代码块的顶部。
from bokeh.resources import INLINE
import bokeh.io
bokeh.io.output_notebook(INLINE)
或使用您的导入
from bokeh.resources import INLINE
from bokeh.io import output_notebook, show, output_file
from bokeh.plotting import figure
from bokeh.models import GeoJSONDataSource, LinearColorMapper, ColorBar
from bokeh.palettes import brewer
output_notebook(INLINE)
从您发布的代码中我看不出绘图有任何问题,因此我认为问题可能出在您的数据聚合或合并中。
这是一个解决方案,首先生成应该与您的数据类似的数据,然后计算一个国家/地区出现在数据中的次数占数据集大小的比例,因为这是必需的指标。我们将专注于仅使用几个国家作为示例:
from random import choices
import pandas as pd
import numpy as np
def generate_data():
k = 100
countries_of_interest = ['USA','ARG','BRA','GBR','ESP','RUS']
countries = choices(countries_of_interest, k=k)
start_yr = 2010
end_yr = 2021
return pd.DataFrame({'Country':countries,
'Year':np.random.randint(start_yr, end_yr, k)},
index=range(len(countries)))
def aggregate_data(df):
data = df.groupby('Country').agg('count')*100.0/len(df)
data = data.reset_index().rename(columns={'Year':'proportion_of_dataset'})
return data
df = generate_data()
# Country Year
# 0 USA 2017
# 1 GBR 2014
# 2 USA 2013
# 3 BRA 2016
# 4 BRA 2018
# .. ... ...
# 95 ESP 2014
# 96 USA 2015
# 97 RUS 2019
# 98 RUS 2012
# 99 RUS 2011
#
# [100 rows x 2 columns]
data = aggregate_data(df)
# Country proportion_of_dataset
# 0 ARG 20.0
# 1 BRA 17.0
# 2 ESP 14.0
# 3 GBR 14.0
# 4 RUS 19.0
# 5 USA 16.0
现在使用 geopandas 加载国家边界 shapefile,并重命名列:
import geopandas as gpd
shapefile = 'path_to_shapfile_folder/ne_110m_admin_0_countries/ne_110m_admin_0_countries.shp'
gdf = gpd.read_file(shapefile)[['ADMIN', 'ADM0_A3', 'geometry']]
gdf.columns = ['country', 'country_code', 'geometry']
gdf.head()
# country country_code \
# 0 Fiji FJI
# 1 United Republic of Tanzania TZA
# 2 Western Sahara SAH
# 3 Canada CAN
# 4 United States of America USA
#
# geometry
# 0 MULTIPOLYGON (((180.00000 -16.06713, 180.00000...
# 1 POLYGON ((33.90371 -0.95000, 34.07262 -1.05982...
# 2 POLYGON ((-8.66559 27.65643, -8.66512 27.58948...
# 3 MULTIPOLYGON (((-122.84000 49.00000, -122.9742...
# 4 MULTIPOLYGON (((-122.84000 49.00000, -120.0000...
现在我们要将国家多边形数据框与我们的聚合数据合并。注意:我们想做一个左连接(在完整的国家多边形数据框上),以便我们包括所有国家,甚至是我们没有数据的国家。另请注意,我们通过用零填充 NaN 来为这些国家/地区添加缺失值:
merged = gdf.merge(data, left_on = 'country_code', right_on = 'Country', how='left')
merged['proportion_of_dataset'] = merged['proportion_of_dataset'].fillna(0)
使用您的代码创建 geojson:
import json
merged_json = json.loads(merged.to_json())
json_data = json.dumps(merged_json)
最后,我们会将您的绘图代码放入一个函数中,并将 geojson、要绘制的列和绘图标题作为参数传入:
from bokeh.io import output_notebook, show, output_file
from bokeh.plotting import figure
from bokeh.models import GeoJSONDataSource, LinearColorMapper, ColorBar
from bokeh.palettes import brewer
def plot_map(json_data,plot_col,title):
geosource = GeoJSONDataSource(geojson = json_data)
#Define a sequential multi-hue color palette.
palette = brewer['YlGnBu'][8]
palette = palette[::-1]
color_mapper = LinearColorMapper(palette = palette, low = 0, high = 40)
tick_labels = {'0': '0%', '5': '5%', '10':'10%', '15':'15%', '20':'20%', '25':'25%', '30':'30%','35':'35%', '40': '>40%'}
color_bar = ColorBar(color_mapper=color_mapper, label_standoff=8,width = 500, height = 20,
border_line_color=None,location = (0,0), orientation = 'horizontal', major_label_overrides = tick_labels)
p = figure(title = title, plot_height = 600 , plot_width = 950, toolbar_location = None)
p.xgrid.grid_line_color = None
p.ygrid.grid_line_color = None
p.patches('xs','ys', source = geosource,fill_color = {'field' :plot_col, 'transform' : color_mapper},
line_color = 'black', line_width = 0.25, fill_alpha = 1)
p.add_layout(color_bar, 'below')
output_notebook()
#Display figure.
show(p)
现在我们要做的就是调用绘图函数,传入所需的参数:
plot_map(json_data,'proportion_of_dataset','Dataset countries of origin')
我需要在地理图上绘制一些数据。具体来说,我想强调数据来源的国家和州。 我的数据集是
Year Country State/City
0 2009 BGR Sofia
1 2018 BHS New Providence
2 2002 BLZ NaN
3 2000 CAN California
4 2002 CAN Ontario
... ... ... ...
250 2001 USA Ohio
251 1998 USA New York
252 1995 USA Virginia
253 2011 USA NaN
254 2019 USA New York
为了创建地理图,我一直在使用 geopandas
如下:
import geopandas as gpd
shapefile = 'path/ne_110m_admin_0_countries/ne_110m_admin_0_countries.shp'
gdf = gpd.read_file(shapefile)[['ADMIN', 'ADM0_A3', 'geometry']]
gdf.columns = ['country', 'country_code', 'geometry']
然后我合并了两个数据集:
merged = gdf.merge(df, left_on = 'country_code', right_on = 'Country')
并将数据转换为 json:
import json
merged_json = json.loads(merged.to_json())
#Convert to String like object.
json_data = json.dumps(merged_json)
最后,我尝试创建如下图表:
from bokeh.io import output_notebook, show, output_file
from bokeh.plotting import figure
from bokeh.models import GeoJSONDataSource, LinearColorMapper, ColorBar
from bokeh.palettes import brewer
geosource = GeoJSONDataSource(geojson = json_data)
#Define a sequential multi-hue color palette.
palette = brewer['YlGnBu'][8]
palette = palette[::-1]
color_mapper = LinearColorMapper(palette = palette, low = 0, high = 40)
tick_labels = {'0': '0%', '5': '5%', '10':'10%', '15':'15%', '20':'20%', '25':'25%', '30':'30%','35':'35%', '40': '>40%'}
color_bar = ColorBar(color_mapper=color_mapper, label_standoff=8,width = 500, height = 20,
border_line_color=None,location = (0,0), orientation = 'horizontal', major_label_overrides = tick_labels)
p = figure(title = 'Creation year across countries', plot_height = 600 , plot_width = 950, toolbar_location = None)
p.xgrid.grid_line_color = None
p.ygrid.grid_line_color = None
p.patches('xs','ys', source = geosource,fill_color = {'field' :'per_cent_year', 'transform' : color_mapper},
line_color = 'black', line_width = 0.25, fill_alpha = 1)
p.add_layout(color_bar, 'below')
output_notebook()
#Display figure.
show(p)
当我 运行 它时,它说 BokehJS 1.0.2 successfully loaded
。但它不显示任何内容。
我的预期输出是一张地图,其中颜色基于一个国家/地区的出现次数(例如 USA=5 会更暗),另一张地图基于 State/City(纽约会更暗)。
上面的代码有什么问题吗?
(如果需要,很高兴分享更多 data/info)
我假设您是 运行 Jupyter Notebook 上的这个,请尝试将此片段添加到代码块的顶部。
from bokeh.resources import INLINE
import bokeh.io
bokeh.io.output_notebook(INLINE)
或使用您的导入
from bokeh.resources import INLINE
from bokeh.io import output_notebook, show, output_file
from bokeh.plotting import figure
from bokeh.models import GeoJSONDataSource, LinearColorMapper, ColorBar
from bokeh.palettes import brewer
output_notebook(INLINE)
从您发布的代码中我看不出绘图有任何问题,因此我认为问题可能出在您的数据聚合或合并中。
这是一个解决方案,首先生成应该与您的数据类似的数据,然后计算一个国家/地区出现在数据中的次数占数据集大小的比例,因为这是必需的指标。我们将专注于仅使用几个国家作为示例:
from random import choices
import pandas as pd
import numpy as np
def generate_data():
k = 100
countries_of_interest = ['USA','ARG','BRA','GBR','ESP','RUS']
countries = choices(countries_of_interest, k=k)
start_yr = 2010
end_yr = 2021
return pd.DataFrame({'Country':countries,
'Year':np.random.randint(start_yr, end_yr, k)},
index=range(len(countries)))
def aggregate_data(df):
data = df.groupby('Country').agg('count')*100.0/len(df)
data = data.reset_index().rename(columns={'Year':'proportion_of_dataset'})
return data
df = generate_data()
# Country Year
# 0 USA 2017
# 1 GBR 2014
# 2 USA 2013
# 3 BRA 2016
# 4 BRA 2018
# .. ... ...
# 95 ESP 2014
# 96 USA 2015
# 97 RUS 2019
# 98 RUS 2012
# 99 RUS 2011
#
# [100 rows x 2 columns]
data = aggregate_data(df)
# Country proportion_of_dataset
# 0 ARG 20.0
# 1 BRA 17.0
# 2 ESP 14.0
# 3 GBR 14.0
# 4 RUS 19.0
# 5 USA 16.0
现在使用 geopandas 加载国家边界 shapefile,并重命名列:
import geopandas as gpd
shapefile = 'path_to_shapfile_folder/ne_110m_admin_0_countries/ne_110m_admin_0_countries.shp'
gdf = gpd.read_file(shapefile)[['ADMIN', 'ADM0_A3', 'geometry']]
gdf.columns = ['country', 'country_code', 'geometry']
gdf.head()
# country country_code \
# 0 Fiji FJI
# 1 United Republic of Tanzania TZA
# 2 Western Sahara SAH
# 3 Canada CAN
# 4 United States of America USA
#
# geometry
# 0 MULTIPOLYGON (((180.00000 -16.06713, 180.00000...
# 1 POLYGON ((33.90371 -0.95000, 34.07262 -1.05982...
# 2 POLYGON ((-8.66559 27.65643, -8.66512 27.58948...
# 3 MULTIPOLYGON (((-122.84000 49.00000, -122.9742...
# 4 MULTIPOLYGON (((-122.84000 49.00000, -120.0000...
现在我们要将国家多边形数据框与我们的聚合数据合并。注意:我们想做一个左连接(在完整的国家多边形数据框上),以便我们包括所有国家,甚至是我们没有数据的国家。另请注意,我们通过用零填充 NaN 来为这些国家/地区添加缺失值:
merged = gdf.merge(data, left_on = 'country_code', right_on = 'Country', how='left')
merged['proportion_of_dataset'] = merged['proportion_of_dataset'].fillna(0)
使用您的代码创建 geojson:
import json
merged_json = json.loads(merged.to_json())
json_data = json.dumps(merged_json)
最后,我们会将您的绘图代码放入一个函数中,并将 geojson、要绘制的列和绘图标题作为参数传入:
from bokeh.io import output_notebook, show, output_file
from bokeh.plotting import figure
from bokeh.models import GeoJSONDataSource, LinearColorMapper, ColorBar
from bokeh.palettes import brewer
def plot_map(json_data,plot_col,title):
geosource = GeoJSONDataSource(geojson = json_data)
#Define a sequential multi-hue color palette.
palette = brewer['YlGnBu'][8]
palette = palette[::-1]
color_mapper = LinearColorMapper(palette = palette, low = 0, high = 40)
tick_labels = {'0': '0%', '5': '5%', '10':'10%', '15':'15%', '20':'20%', '25':'25%', '30':'30%','35':'35%', '40': '>40%'}
color_bar = ColorBar(color_mapper=color_mapper, label_standoff=8,width = 500, height = 20,
border_line_color=None,location = (0,0), orientation = 'horizontal', major_label_overrides = tick_labels)
p = figure(title = title, plot_height = 600 , plot_width = 950, toolbar_location = None)
p.xgrid.grid_line_color = None
p.ygrid.grid_line_color = None
p.patches('xs','ys', source = geosource,fill_color = {'field' :plot_col, 'transform' : color_mapper},
line_color = 'black', line_width = 0.25, fill_alpha = 1)
p.add_layout(color_bar, 'below')
output_notebook()
#Display figure.
show(p)
现在我们要做的就是调用绘图函数,传入所需的参数:
plot_map(json_data,'proportion_of_dataset','Dataset countries of origin')