合并 Pandas-文件与 OSMNX

Question

我想根据那里发生的事故找出瑞士最危险的道路。我有一个带有地理位置的 csv 文件，其中每一行都是一次事故，并包含有关事故类型、涉及人员、日期、地理位置（我设法将其转换为 EPSG:4326）等信息。我用文件做了一些定量分析，一切都很好。

但我现在需要将这些坐标放在地图上以进行进一步计算。我想将 OSMNX 与卢塞恩市一起用于测试目的。 G = ox.graph_from_place('Luzern, Switzerland', network_type='drive')

但我不知道如何将我的事故档案添加到该地图。而且不知道如何搜索它。（“将 Pandas 文件与 OSMNX 合并”或类似问题不是提出该问题的正确方式）。

完成后，我将能够使用一些 OSMNX 函数（如 nearest_edges）来确定我想知道的内容。但首先我需要合并这两个文件。谁能告诉我必须使用什么代码？

我不知道我是否提供了足够的信息，但如果您告诉我您需要什么，当然会提供更多。

我在 Mac（OS 大苏尔）上使用 Jupiter Notebook 6.3.0。

Answer 1

使用瑞士事故数据
在事故数据和 OSMNX 数据之间使用了两种 merge 技术
1. 使用城市多边形的范围数据。为此
2. located bad roads by using sjoin_nearest() to find index of LineString of road with relationship to the location of a circuit
然后将所有这些可视化为 folium 地图

import geopandas as gpd
import pandas as pd
import osmnx as ox
import folium
import requests
from pathlib import Path
from zipfile import ZipFile

url = "https://data.geo.admin.ch/ch.astra.unfaelle-personenschaeden_alle/unfaelle-personenschaeden_alle/unfaelle-personenschaeden_alle_2056.csv.zip"
f = Path.cwd().joinpath(url.split("/")[-1])
if not f.exists():
    r = requests.get(
        url,
        stream=True,
    )
    with open(f, "wb") as fd:
        for chunk in r.iter_content(chunk_size=128):
            fd.write(chunk)

df_acc = [
    pd.read_csv(ZipFile(f).open(zf))
    for zf in ZipFile(f).infolist()
    if zf.filename.split(".")[-1] == "csv"
][0]
df_acc = df_acc.loc[
    :, [c for c in df_acc.columns if c.split("_")[-1] not in ["it", "de", "fr"]]
]
gdf_acc = gpd.GeoDataFrame(
    df_acc,
    geometry=gpd.points_from_xy(
        df_acc["AccidentLocation_CHLV95_E"], df_acc["AccidentLocation_CHLV95_N"]
    ),
    crs="EPSG:2056",
).to_crs("epsg:4326")


# get OSM data for investigated location
G = ox.graph_from_place("Luzern, Switzerland", network_type="drive")
gdf_nodes, gdf_edges = ox.utils_graph.graph_to_gdfs(G)

# get bounding polygon of investigated location
gdf_poly = ox.geocode_to_gdf({"city": "Luzern"}).loc[:, ["geometry", "display_name"]]

# reduce accidents down to those in investigated location
gdf_loc = gdf_acc.sjoin(gdf_poly)

# get roads with accidents
gdf_edges2 = gdf_edges.reset_index(drop=True).loc[:, ["name", "geometry"]]
gdf_bad_roads = gdf_edges2.loc[
    gdf_loc.loc[:, ["geometry", "display_name"]]
    .sjoin_nearest(gdf_edges2)["index_right"]
    .unique()
]


# now let's visualize what we have
m = gdf_poly.explore(
    name="Boundary",
    color="blue",
    style_kwds={"fillOpacity": 0.1},
    height=300,
    width=500,
)
m = gdf_edges.explore(name="Roads", m=m)
m = gdf_bad_roads.explore(name="Bad Roads", m=m, color="yellow")
m = gdf_loc.explore(name="Accidents", m=m, color="red")
folium.LayerControl().add_to(m)
m

osmnx nearest_edges()

这确实与第一个解决方案相同。第一个解决方案使用 geopandas sjoin_nearest()
尚未对性能进行基准测试，两者都将使用 rtree 和空间索引
玩过一点会分析，聚合基于相同的边与相同的边相关联给边着色

# project graph and points
G_proj = ox.project_graph(G)
gdf_loc_p = gdf_loc["geometry"].to_crs(G_proj.graph["crs"])

ne, d = ox.nearest_edges(
    G_proj, X=gdf_loc_p.x.values, Y=gdf_loc_p.y.values, return_dist=True
)

# reindex points based on results from nearest_edges
gdf_loc = (
    gdf_loc.set_index(pd.MultiIndex.from_tuples(ne, names=["u", "v", "key"]))
    .assign(distance=d)
    .sort_index()
)

# join geometry from edges back to points
# aggregate so have number of accidents on each edge
gdf_bad_roads = (
    gdf_edges.join(gdf_loc, rsuffix="_loc", how="inner")
    .groupby(["u", "v", "key"])
    .agg(geometry=("geometry", "first"), number=("osmid", "size"))
    .set_crs(gdf_edges.crs)
)
# categorise edges based on number of accidents
gdf_bad_roads["cat"] = pd.qcut(gdf_bad_roads["number"], q=2, labels=["low", "high"])

m = gdf_poly.explore(
    name="Boundary",
    color="blue",
    style_kwds={"fillOpacity": 0.1},
    height=300,
    width=500,
)
m = gdf_bad_roads.explore(
    m=m, column="cat", cmap=["yellow", "red"], name="Accident roads"
)
m = gdf_loc.explore(name="Accidents", m=m, color="red")
folium.LayerControl().add_to(m)
m

Answer 2

非常感谢您的回答 Rob。现在这两个文件合并了，太好了 :) 我可视化了来自瑞士城市卢塞恩的数据，看起来不错，因此我想我正确地执行了您的步骤：Plotting Accidents Lucerne

但我现在尝试了几个小时，但仍然不知道如何使用该数据进行计算。顺便说一句，可以在这里找到瑞士事故：https://data.geo.admin.ch/ch.astra.unfaelle-personenschaeden_alle/（我也将它们编辑到第一个 post）。

我环顾四周，发现 nearest_edges [在这个答案中] 看起来像我需要的 2 但不知道如何处理它：/ 我阅读了文档但是在尝试这个时

#get a street network and randomly sample 10,000 points, from 
G = ox.graph_from_place('Lucerne, Switzerland', network_type='drive')
G_proj = ox.project_graph(G)
points = gdf_acc(ox.get_undirected(G_proj), 10000)

%time ne1 = ox.nearest_edges(G_proj, X=points.x, Y=.y, return_dist=True)
# wall time: 2.91 s

%time ne2 = ox.nearest_edges(G_proj, X=points.x, Y=points.y, interpolate=10, return_dist=True)
# wall time: 302 ms

我收到此错误消息

`TypeError                                 Traceback (most recent call last)
/var/folders/jy/1f2tlvb965g30zhw9q3cvdw07r5rb_/T/ipykernel_74128/4210512587.py in <module>
      2 G = ox.graph_from_place('Lucerne, Switzerland', network_type='drive')
      3 G_proj = ox.project_graph(G)
----> 4 points = gdf_acc(ox.get_undirected(G_proj), 10000)
      5 
      6 get_ipython().run_line_magic('time', 'ne1 = ox.nearest_edges(G_proj, X=points.x, Y=.y, return_dist=True)')

TypeError: 'GeoDataFrame' object is not callable`

因此，我正在做一些完全错误的事情，但不知道那是什么:) 也许我的措辞全错了……我的目标是列出瑞士发生事故最多的道路（使用卢塞恩作为 test-object）。我可以通过可视化和计数来做到这一点，但我宁愿用代码来做到这一点，因为这样我就可以检查所有类型的事情（大多数事故发生在自行车上，随着时间的推移事故增加最多的街道，透视事故和交通流,..).我问了 GIS 人员，他们告诉我最好的方法是找到距离事故最近的街道，并在每个点上都这样做。然后我就可以用那个列表做定量分析了。

合并 Pandas-文件与 OSMNX

Merging Pandas-File with OSMNX

python

openstreetmap

geopandas

osmnx

osmnx nearest_edges()