散景映射县
Bokeh Mapping Counties
我正在尝试使用密歇根州的县数据修改 this example。简而言之,它正在工作,但它似乎在绘制县的过程中在这里和那里添加了一些额外的形状。我猜想在某些情况下(有岛屿的县),岛屿部分需要作为单独的 "county" 列出,但我不确定其他情况,例如韦恩县状态的右下部分。
这是我目前拥有的照片:
这是我目前所做的:
- 从 Bokeh 的样本县数据中获取县数据只是为了获得每个州编号的州缩写(我的第二个主要数据源只有州编号)。对于此示例,我将通过仅过滤状态编号 26) 来简化它。
- 得到state coordinates ('500k' file) by county from the U.S. Census site.
- 使用以下代码生成 'interactive' 密歇根地图。
注意:要 pip install shapefile(实际上是 pyshp),我想我必须从 here 下载 .whl 文件,然后执行 pip install [.whl 文件的路径]。
import pandas as pd
import numpy as np
import shapefile
from bokeh.models import HoverTool, ColumnDataSource
from bokeh.palettes import Viridis6
from bokeh.plotting import figure, show, output_notebook
shpfile=r'Path0K_US_Counties\cb_2015_us_county_500k.shp'
sf = shapefile.Reader(shpfile)
shapes = sf.shapes()
#Here are the rows from the shape file (plus lat/long coordinates)
rows=[]
lenrow=[]
for i,j in zip(sf.shapeRecords(),sf.shapes()):
rows.append(i.record+[j.points])
if len(i.record+[j.points])!=10:
print("Found record with irrular number of columns")
fields1=sf.fields[1:] #Ignore first field as it is not used (maybe it's a meta field?)
fields=[seq[0] for seq in fields1]+['Long_Lat']#Take the first element in each tuple of the list
c=pd.DataFrame(rows,columns=fields)
try:
c['STATEFP']=c['STATEFP'].astype(int)
except:
pass
#cns=pd.read_csv(r'Path\US_Counties.csv')
#cns=cns[['State Abbr.','STATE num']]
#cns=cns.drop_duplicates('State Abbr.',keep='first')
#c=pd.merge(c,cns,how='left',left_on='STATEFP',right_on='STATE num')
c['Lat']=c['Long_Lat'].apply(lambda x: [e[0] for e in x])
c['Long']=c['Long_Lat'].apply(lambda x: [e[1] for e in x])
#c=c.loc[c['State Abbr.']=='MI']
c=c.loc[c['STATEFP']==26]
#latitudex, longitude=y
county_xs = c['Lat']
county_ys = c['Long']
county_names = c['NAME']
county_colors = [Viridis6[np.random.randint(1,6, size=1).tolist()[0]] for l in aland]
randns=np.random.randint(1,6, size=1).tolist()[0]
#county_colors = [Viridis6[e] for e in randns]
#county_colors = 'b'
source = ColumnDataSource(data=dict(
x=county_xs,
y=county_ys,
color=county_colors,
name=county_names,
#rate=county_rates,
))
output_notebook()
TOOLS="pan,wheel_zoom,box_zoom,reset,hover,save"
p = figure(title="Title", tools=TOOLS,
x_axis_location=None, y_axis_location=None)
p.grid.grid_line_color = None
p.patches('x', 'y', source=source,
fill_color='color', fill_alpha=0.7,
line_color="white", line_width=0.5)
hover = p.select_one(HoverTool)
hover.point_policy = "follow_mouse"
hover.tooltips = [
("Name", "@name"),
#("Unemployment rate)", "@rate%"),
("(Long, Lat)", "($x, $y)"),
]
show(p)
我正在寻找一种方法来避免多余的线条和形状。
提前致谢!
一个解决方案:
使用 1:20,000,000 形状文件而不是 1:500,000 文件。
它丢失了每个县形状周围的一些细节,但没有任何额外的形状(只有几条额外的线)。
我有解决这个问题的方法,我认为我什至可能知道它为什么是正确的。首先,让我在 Google 组 Bokeh 讨论中引用 Bryan Van de ven 的话:
there is no built-in support for dealing with shapefiles. You will have to convert the data to the simple format that Bokeh understands. (As an aside: it would be great to have a contribution that made dealing with various GIS formats easier).
The format that Bokeh expects for patches is a "list of lists" of points. So something like:
xs = [ [patch0 x-coords], [patch1 x-coords], ... ]
ys = [ [patch1 y-coords], [patch1 y-coords], ... ]
Note that if a patch is comprised of multiple polygons, this is currently expressed by putting NaN values in the sublists. So, the task is basically to convert whatever form of polygon data you have to this format, and then Bokeh can display it.
因此,您似乎以某种方式忽略了 NaN 或未正确处理多个多边形。下面是一些代码,可以下载美国人口普查数据,解压缩,为 Bokeh 正确读取,并制作纬度、经度、州和县的数据框。
def get_map_data(shape_data_file, local_file_path):
url = "http://www2.census.gov/geo/tiger/GENZ2015/shp/" + \
shape_data_file + ".zip"
zfile = local_file_path + shape_data_file + ".zip"
sfile = local_file_path + shape_data_file + ".shp"
dfile = local_file_path + shape_data_file + ".dbf"
if not os.path.exists(zfile):
print("Getting file: ", url)
response = requests.get(url)
with open(zfile, "wb") as code:
code.write(response.content)
if not os.path.exists(sfile):
uz_cmd = 'unzip ' + zfile + " -d " + local_file_path
print("Executing command: " + uz_cmd)
os.system(uz_cmd)
shp = open(sfile, "rb")
dbf = open(dfile, "rb")
sf = shapefile.Reader(shp=shp, dbf=dbf)
lats = []
lons = []
ct_name = []
st_id = []
for shprec in sf.shapeRecords():
st_id.append(int(shprec.record[0]))
ct_name.append(shprec.record[5])
lat, lon = map(list, zip(*shprec.shape.points))
indices = shprec.shape.parts.tolist()
lat = [lat[i:j] + [float('NaN')] for i, j in zip(indices, indices[1:]+[None])]
lon = [lon[i:j] + [float('NaN')] for i, j in zip(indices, indices[1:]+[None])]
lat = list(itertools.chain.from_iterable(lat))
lon = list(itertools.chain.from_iterable(lon))
lats.append(lat)
lons.append(lon)
map_data = pd.DataFrame({'x': lats, 'y': lons, 'state': st_id, 'county_name': ct_name})
return map_data
此命令的输入是要将地图数据下载到的本地目录,另一个输入是形状文件的名称。我知道上面函数中的 url 至少有两个可用的映射,您可以调用它们:
map_low_res = "cb_2015_us_county_20m"
map_high_res = "cb_2015_us_county_500k"
如果美国人口普查改变了他们的 url,他们肯定有一天会改变,那么您将需要更改输入文件名和 url 变量。所以,你可以调用上面的函数
map_output = get_map_data(map_low_res, ".")
然后你就可以像原问题中的代码一样绘制它了。先添加一个颜色数据列(原题中的“county_colors”),然后像这样设置到源中:
source = ColumnDataSource(map_output)
要使这一切正常工作,您需要导入库,例如请求、os、itertools、shapefile、bokeh.models.ColumnDataSource 等...
我正在尝试使用密歇根州的县数据修改 this example。简而言之,它正在工作,但它似乎在绘制县的过程中在这里和那里添加了一些额外的形状。我猜想在某些情况下(有岛屿的县),岛屿部分需要作为单独的 "county" 列出,但我不确定其他情况,例如韦恩县状态的右下部分。
这是我目前拥有的照片:
这是我目前所做的:
- 从 Bokeh 的样本县数据中获取县数据只是为了获得每个州编号的州缩写(我的第二个主要数据源只有州编号)。对于此示例,我将通过仅过滤状态编号 26) 来简化它。
- 得到state coordinates ('500k' file) by county from the U.S. Census site.
- 使用以下代码生成 'interactive' 密歇根地图。
注意:要 pip install shapefile(实际上是 pyshp),我想我必须从 here 下载 .whl 文件,然后执行 pip install [.whl 文件的路径]。
import pandas as pd
import numpy as np
import shapefile
from bokeh.models import HoverTool, ColumnDataSource
from bokeh.palettes import Viridis6
from bokeh.plotting import figure, show, output_notebook
shpfile=r'Path0K_US_Counties\cb_2015_us_county_500k.shp'
sf = shapefile.Reader(shpfile)
shapes = sf.shapes()
#Here are the rows from the shape file (plus lat/long coordinates)
rows=[]
lenrow=[]
for i,j in zip(sf.shapeRecords(),sf.shapes()):
rows.append(i.record+[j.points])
if len(i.record+[j.points])!=10:
print("Found record with irrular number of columns")
fields1=sf.fields[1:] #Ignore first field as it is not used (maybe it's a meta field?)
fields=[seq[0] for seq in fields1]+['Long_Lat']#Take the first element in each tuple of the list
c=pd.DataFrame(rows,columns=fields)
try:
c['STATEFP']=c['STATEFP'].astype(int)
except:
pass
#cns=pd.read_csv(r'Path\US_Counties.csv')
#cns=cns[['State Abbr.','STATE num']]
#cns=cns.drop_duplicates('State Abbr.',keep='first')
#c=pd.merge(c,cns,how='left',left_on='STATEFP',right_on='STATE num')
c['Lat']=c['Long_Lat'].apply(lambda x: [e[0] for e in x])
c['Long']=c['Long_Lat'].apply(lambda x: [e[1] for e in x])
#c=c.loc[c['State Abbr.']=='MI']
c=c.loc[c['STATEFP']==26]
#latitudex, longitude=y
county_xs = c['Lat']
county_ys = c['Long']
county_names = c['NAME']
county_colors = [Viridis6[np.random.randint(1,6, size=1).tolist()[0]] for l in aland]
randns=np.random.randint(1,6, size=1).tolist()[0]
#county_colors = [Viridis6[e] for e in randns]
#county_colors = 'b'
source = ColumnDataSource(data=dict(
x=county_xs,
y=county_ys,
color=county_colors,
name=county_names,
#rate=county_rates,
))
output_notebook()
TOOLS="pan,wheel_zoom,box_zoom,reset,hover,save"
p = figure(title="Title", tools=TOOLS,
x_axis_location=None, y_axis_location=None)
p.grid.grid_line_color = None
p.patches('x', 'y', source=source,
fill_color='color', fill_alpha=0.7,
line_color="white", line_width=0.5)
hover = p.select_one(HoverTool)
hover.point_policy = "follow_mouse"
hover.tooltips = [
("Name", "@name"),
#("Unemployment rate)", "@rate%"),
("(Long, Lat)", "($x, $y)"),
]
show(p)
我正在寻找一种方法来避免多余的线条和形状。
提前致谢!
一个解决方案: 使用 1:20,000,000 形状文件而不是 1:500,000 文件。 它丢失了每个县形状周围的一些细节,但没有任何额外的形状(只有几条额外的线)。
我有解决这个问题的方法,我认为我什至可能知道它为什么是正确的。首先,让我在 Google 组 Bokeh 讨论中引用 Bryan Van de ven 的话:
there is no built-in support for dealing with shapefiles. You will have to convert the data to the simple format that Bokeh understands. (As an aside: it would be great to have a contribution that made dealing with various GIS formats easier).
The format that Bokeh expects for patches is a "list of lists" of points. So something like:
xs = [ [patch0 x-coords], [patch1 x-coords], ... ] ys = [ [patch1 y-coords], [patch1 y-coords], ... ]
Note that if a patch is comprised of multiple polygons, this is currently expressed by putting NaN values in the sublists. So, the task is basically to convert whatever form of polygon data you have to this format, and then Bokeh can display it.
因此,您似乎以某种方式忽略了 NaN 或未正确处理多个多边形。下面是一些代码,可以下载美国人口普查数据,解压缩,为 Bokeh 正确读取,并制作纬度、经度、州和县的数据框。
def get_map_data(shape_data_file, local_file_path):
url = "http://www2.census.gov/geo/tiger/GENZ2015/shp/" + \
shape_data_file + ".zip"
zfile = local_file_path + shape_data_file + ".zip"
sfile = local_file_path + shape_data_file + ".shp"
dfile = local_file_path + shape_data_file + ".dbf"
if not os.path.exists(zfile):
print("Getting file: ", url)
response = requests.get(url)
with open(zfile, "wb") as code:
code.write(response.content)
if not os.path.exists(sfile):
uz_cmd = 'unzip ' + zfile + " -d " + local_file_path
print("Executing command: " + uz_cmd)
os.system(uz_cmd)
shp = open(sfile, "rb")
dbf = open(dfile, "rb")
sf = shapefile.Reader(shp=shp, dbf=dbf)
lats = []
lons = []
ct_name = []
st_id = []
for shprec in sf.shapeRecords():
st_id.append(int(shprec.record[0]))
ct_name.append(shprec.record[5])
lat, lon = map(list, zip(*shprec.shape.points))
indices = shprec.shape.parts.tolist()
lat = [lat[i:j] + [float('NaN')] for i, j in zip(indices, indices[1:]+[None])]
lon = [lon[i:j] + [float('NaN')] for i, j in zip(indices, indices[1:]+[None])]
lat = list(itertools.chain.from_iterable(lat))
lon = list(itertools.chain.from_iterable(lon))
lats.append(lat)
lons.append(lon)
map_data = pd.DataFrame({'x': lats, 'y': lons, 'state': st_id, 'county_name': ct_name})
return map_data
此命令的输入是要将地图数据下载到的本地目录,另一个输入是形状文件的名称。我知道上面函数中的 url 至少有两个可用的映射,您可以调用它们:
map_low_res = "cb_2015_us_county_20m"
map_high_res = "cb_2015_us_county_500k"
如果美国人口普查改变了他们的 url,他们肯定有一天会改变,那么您将需要更改输入文件名和 url 变量。所以,你可以调用上面的函数
map_output = get_map_data(map_low_res, ".")
然后你就可以像原问题中的代码一样绘制它了。先添加一个颜色数据列(原题中的“county_colors”),然后像这样设置到源中:
source = ColumnDataSource(map_output)
要使这一切正常工作,您需要导入库,例如请求、os、itertools、shapefile、bokeh.models.ColumnDataSource 等...