非成对距离测量,同时保留原始 geopandas 数据帧中的所有列
non-pairwise distance measure while preserve all columns from original geopandas dataframes
提供了一种在两个 geopandas 数据帧 (gdf) 之间进行非成对距离计算的解决方案。然而,结果距离矩阵仅保留来自两个 gdf 的索引,这可能不可读。我如下向 gdf 添加一些列,然后获取距离矩阵:
import pandas as pd
import geopandas as gpd
gdf_1 = gpd.GeoDataFrame(geometry=gpd.points_from_xy([0, 0, 0], [0, 90, 120]))
gdf_2 = gpd.GeoDataFrame(geometry=gpd.points_from_xy([0, 0], [0, -90]))
home = ['home_1', 'home_2', 'home_3']
shop = ['shop_1', 'shop_2']
gdf_1['home'] = home
gdf_2['shop'] = shop
gdf_1.geometry.apply(lambda g: gdf_2.distance(g))
如上面的 table 所示,除了索引之外,原始 gdf 的任何内容都没有保留在结果中,这可能不直观和有用。我想知道如何在结果距离矩阵中保留来自两个 gdf 的所有原始列,或者至少保留这样的“家”、“商店”和“距离”列:
请注意:“距离”是从家到商店的距离度量,其他“几何”列可能需要后缀
您可以结合使用堆栈和合并来创建所需的输出。
import pandas as pd
import geopandas as gpd
gdf_1 = gpd.GeoDataFrame(geometry=gpd.points_from_xy([0, 0, 0], [0, 90, 120]))
gdf_2 = gpd.GeoDataFrame(geometry=gpd.points_from_xy([0, 0], [0, -90]))
home = ['home_1', 'home_2', 'home_3']
shop = ['shop_1', 'shop_2']
gdf_1['home'] = home
gdf_2['shop'] = shop
# set indices so we can have them in gdf_3
# you could also do this when making gdf_1 and gdf
gdf_1.index = gdf_1['home']
gdf_2.index = gdf_2['shop']
gdf_3 = gdf_1.geometry.apply(lambda g: gdf_2.distance(g))
# reshape our data, stack returns a series here, but we want a df
gdf_4 = pd.DataFrame(gdf_3.stack(level=- 1, dropna=True))
gdf_4.reset_index(inplace=True)
# merge the original columns over
df_merge_1 = pd.merge(gdf_4, gdf_2,
left_on='shop',
right_on=gdf_2.index,
how='outer').fillna('')
df_merge_2 = pd.merge(df_merge_1, gdf_1,
left_on='home',
right_on=gdf_1.index,
how='outer').fillna('')
# get rid of extra cols
df_merge_2 = df_merge_2[[ 'shop', 'home', 0, 'geometry_x', 'geometry_y']]
# rename cols
df_merge_2.columns = ['shop', 'home', 'distance', 'geometry_s', 'geometry_h']
df_merge_2 是一个 pandas df,但是你可以很容易地创建一个 gdf。
df_merge_2_gdf = gpd.GeoDataFrame(df_merge_2, geometry=df_merge_2['geometry_h'])
import pandas as pd
import geopandas as gpd
gdf_1 = gpd.GeoDataFrame(geometry=gpd.points_from_xy([0, 0, 0], [0, 90, 120]))
gdf_2 = gpd.GeoDataFrame(geometry=gpd.points_from_xy([0, 0], [0, -90]))
home = ['home_1', 'home_2', 'home_3']
shop = ['shop_1', 'shop_2']
gdf_1['home'] = home
gdf_2['shop'] = shop
gdf_1.geometry.apply(lambda g: gdf_2.distance(g))
如上面的 table 所示,除了索引之外,原始 gdf 的任何内容都没有保留在结果中,这可能不直观和有用。我想知道如何在结果距离矩阵中保留来自两个 gdf 的所有原始列,或者至少保留这样的“家”、“商店”和“距离”列:
请注意:“距离”是从家到商店的距离度量,其他“几何”列可能需要后缀
您可以结合使用堆栈和合并来创建所需的输出。
import pandas as pd
import geopandas as gpd
gdf_1 = gpd.GeoDataFrame(geometry=gpd.points_from_xy([0, 0, 0], [0, 90, 120]))
gdf_2 = gpd.GeoDataFrame(geometry=gpd.points_from_xy([0, 0], [0, -90]))
home = ['home_1', 'home_2', 'home_3']
shop = ['shop_1', 'shop_2']
gdf_1['home'] = home
gdf_2['shop'] = shop
# set indices so we can have them in gdf_3
# you could also do this when making gdf_1 and gdf
gdf_1.index = gdf_1['home']
gdf_2.index = gdf_2['shop']
gdf_3 = gdf_1.geometry.apply(lambda g: gdf_2.distance(g))
# reshape our data, stack returns a series here, but we want a df
gdf_4 = pd.DataFrame(gdf_3.stack(level=- 1, dropna=True))
gdf_4.reset_index(inplace=True)
# merge the original columns over
df_merge_1 = pd.merge(gdf_4, gdf_2,
left_on='shop',
right_on=gdf_2.index,
how='outer').fillna('')
df_merge_2 = pd.merge(df_merge_1, gdf_1,
left_on='home',
right_on=gdf_1.index,
how='outer').fillna('')
# get rid of extra cols
df_merge_2 = df_merge_2[[ 'shop', 'home', 0, 'geometry_x', 'geometry_y']]
# rename cols
df_merge_2.columns = ['shop', 'home', 'distance', 'geometry_s', 'geometry_h']
df_merge_2 是一个 pandas df,但是你可以很容易地创建一个 gdf。
df_merge_2_gdf = gpd.GeoDataFrame(df_merge_2, geometry=df_merge_2['geometry_h'])