空间连接后计算每个国家的空间平均值
Calculating spatial averages for each country after spatial join
您好,我在底部使用以下代码从坐标中提取国家/地区。请参阅下面的url,其中提供了更详细的代码解释:.
我的主要 variable/value 是月平均 pdsi 值来自:https://psl.noaa.gov/data/gridded/data.pdsi.html。下图表示由下面的代码创建的可视化的一部分。阴影正方形代表 pdsi 值的空间区域,它与世界的 shapefile 重叠。
从比利时的图像中可以看出,这4个方块与比利时的陆地面积相接,同时也与其他国家相接。如果我将基准值归因于比利时,我认为这会高估平均 pdsi 值。尤其是当考虑到底部的两个方块几乎不接触比利时时,计算平均值时这些值的权重应该显着降低。因此,有没有一种方法可以合并某种加权平均值,其中一个国家/地区内每个正方形的面积都可以用作调整每个 pdsi 值的权重?此外,我希望不仅为比利时而且为所有国家/地区标准化此流程。
如有任何帮助,我们将不胜感激!
import geopandas as gpd
import numpy as np
import plotly.express as px
import requests
from pathlib import Path
from zipfile import ZipFile
import urllib
import shapely.geometry
import xarray as xr
# download NetCDF data...
# fmt: off
url = "https://psl.noaa.gov/repository/entry/get/pdsi.mon.mean.selfcalibrated.nc?entryid=synth%3Ae570c8f9-ec09-4e89-93b4-babd5651e7a9%3AL2RhaV9wZHNpL3Bkc2kubW9uLm1lYW4uc2VsZmNhbGlicmF0ZWQubmM%3D"
f = Path.cwd().joinpath(Path(urllib.parse.urlparse(url).path).name)
# fmt: on
if not f.exists():
r = requests.get(url, stream=True, headers={"User-Agent": "XY"})
with open(f, "wb") as fd:
for chunk in r.iter_content(chunk_size=128):
fd.write(chunk)
ds = xr.open_dataset(f)
pdsi = ds.to_dataframe()
pdsi = pdsi.reset_index().dropna() # don't care about places in oceans...
# use subset for testing... last 5 times...
pdsim = pdsi.loc[pdsi["time"].isin(pdsi.groupby("time").size().index[-5:])]
# create geopandas dataframe
gdf = gpd.GeoDataFrame(
pdsim, geometry=pdsim.loc[:, ["lon", "lat"]].apply(shapely.geometry.Point, axis=1)
)
# make sure that data supports using a buffer...
assert (
gdf["lat"].diff().loc[lambda s: s.ne(0)].mode()
== gdf["lon"].diff().loc[lambda s: s.ne(0)].mode()
).all()
# how big should the square buffer be around the point??
buffer = gdf["lat"].diff().loc[lambda s: s.ne(0)].mode().values[0] / 2
gdf["geometry"] = gdf["geometry"].buffer(buffer, cap_style=3)
# Import shapefile from geopandas
path_to_data = gpd.datasets.get_path("naturalearth_lowres")
world_shp = gpd.read_file(path_to_data)
# the solution... spatial join buffered polygons to countries
# comma separate associated countries
gdf = gdf.join(
world_shp.sjoin(gdf.set_crs("EPSG:4326"))
.groupby("index_right")["name"]
.agg(",".join)
)
gdf["time_a"] = gdf["time"].dt.strftime("%Y-%b-%d")
# simplest way to test is visualise...
px.choropleth_mapbox(
gdf,
geojson=gdf.geometry,
locations=gdf.index,
color="pdsi",
hover_data=["name"],
animation_frame="time_a",
opacity=.3
).update_layout(
mapbox={"style": "carto-positron", "zoom": 1},
margin={"l": 0, "r": 0, "t": 0, "b": 0},
)
- 使用 https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoSeries.intersection.html 你可以获得与国家多边形相交的部分网格
- 利用面积,可以计算重叠的比例
- 由此我生成了两个可视化
- 显示网格重叠的国家和重叠程度
- 使用加权平均值对国家进行汇总,并计算可用于提高透明度的其他措施
我不知道以这种方式(均值或加权平均值)汇总 PDSI 在数学上/科学上是否合理。这确实演示了如何获得您的问题请求的结果。
# the solution... spatial join buffered polygons to countries
# plus work out overlap between PDSI grid and country. Area of each grid is constant...
gdf_c = (
world_shp.sjoin(gdf.set_crs("EPSG:4326"))
.merge(
gdf.loc[:, "geometry"],
left_on="index_right",
right_index=True,
suffixes=("", "_pdsi"),
)
.assign(
overlap=lambda d: (
d["geometry"]
.intersection(gpd.GeoSeries(d["geometry_pdsi"], crs="EPSG:4326"))
.area
/ (buffer * 2) ** 2
).round(3)
)
)
# comma separate associated countries and a list of overlaps
gdf_pdsi = gdf.loc[:, ["geometry", "time", "pdsi"]].join(
gdf_c.groupby("index_right").agg({"name": ",".join, "overlap": list})
)
gdf_pdsi["time_a"] = gdf_pdsi["time"].dt.strftime("%Y-%b-%d")
# simplest way to test is visualise...
fig_buf = px.choropleth_mapbox(
gdf_pdsi,
geojson=gdf_pdsi.geometry,
locations=gdf_pdsi.index,
color="pdsi",
hover_data=["name", "overlap"],
animation_frame="time_a",
opacity=0.3,
).update_layout(
mapbox={"style": "carto-positron", "zoom": 1},
margin={"l": 0, "r": 0, "t": 0, "b": 0},
)
fig_buf
import pandas as pd
# prepare data to plot by country
df_pdsi = (
gdf_c.groupby(["name", "time"])
.apply(
lambda d: pd.Series(
{
"weighted_pdsi": (d["pdsi"] * d["overlap"]).sum() / d["overlap"].sum(),
"unweighted_pdsi": d["pdsi"].mean(),
"min_pdsi": d["pdsi"].min(),
"max_pdsi": d["pdsi"].max(),
"min_overlap": d["overlap"].min(),
"max_overlap": d["overlap"].max(),
"size_pdsi": len(d["pdsi"]),
# "pdsi_list":[round(v,2) for v in d["pdsi"]]
}
)
)
.reset_index()
)
df_pdsi["time_a"] = df_pdsi["time"].dt.strftime("%Y-%b-%d")
fig = px.choropleth_mapbox(
df_pdsi,
geojson=world_shp.set_index("name").loc[:, "geometry"],
locations="name",
color="weighted_pdsi",
hover_data=df_pdsi.columns,
animation_frame="time_a",
opacity=0.3,
).update_layout(
mapbox={"style": "carto-positron", "zoom": 1},
margin={"l": 0, "r": 0, "t": 0, "b": 0},
)
fig
您好,我在底部使用以下代码从坐标中提取国家/地区。请参阅下面的url,其中提供了更详细的代码解释:
我的主要 variable/value 是月平均 pdsi 值来自:https://psl.noaa.gov/data/gridded/data.pdsi.html。下图表示由下面的代码创建的可视化的一部分。阴影正方形代表 pdsi 值的空间区域,它与世界的 shapefile 重叠。
从比利时的图像中可以看出,这4个方块与比利时的陆地面积相接,同时也与其他国家相接。如果我将基准值归因于比利时,我认为这会高估平均 pdsi 值。尤其是当考虑到底部的两个方块几乎不接触比利时时,计算平均值时这些值的权重应该显着降低。因此,有没有一种方法可以合并某种加权平均值,其中一个国家/地区内每个正方形的面积都可以用作调整每个 pdsi 值的权重?此外,我希望不仅为比利时而且为所有国家/地区标准化此流程。
如有任何帮助,我们将不胜感激!
import geopandas as gpd
import numpy as np
import plotly.express as px
import requests
from pathlib import Path
from zipfile import ZipFile
import urllib
import shapely.geometry
import xarray as xr
# download NetCDF data...
# fmt: off
url = "https://psl.noaa.gov/repository/entry/get/pdsi.mon.mean.selfcalibrated.nc?entryid=synth%3Ae570c8f9-ec09-4e89-93b4-babd5651e7a9%3AL2RhaV9wZHNpL3Bkc2kubW9uLm1lYW4uc2VsZmNhbGlicmF0ZWQubmM%3D"
f = Path.cwd().joinpath(Path(urllib.parse.urlparse(url).path).name)
# fmt: on
if not f.exists():
r = requests.get(url, stream=True, headers={"User-Agent": "XY"})
with open(f, "wb") as fd:
for chunk in r.iter_content(chunk_size=128):
fd.write(chunk)
ds = xr.open_dataset(f)
pdsi = ds.to_dataframe()
pdsi = pdsi.reset_index().dropna() # don't care about places in oceans...
# use subset for testing... last 5 times...
pdsim = pdsi.loc[pdsi["time"].isin(pdsi.groupby("time").size().index[-5:])]
# create geopandas dataframe
gdf = gpd.GeoDataFrame(
pdsim, geometry=pdsim.loc[:, ["lon", "lat"]].apply(shapely.geometry.Point, axis=1)
)
# make sure that data supports using a buffer...
assert (
gdf["lat"].diff().loc[lambda s: s.ne(0)].mode()
== gdf["lon"].diff().loc[lambda s: s.ne(0)].mode()
).all()
# how big should the square buffer be around the point??
buffer = gdf["lat"].diff().loc[lambda s: s.ne(0)].mode().values[0] / 2
gdf["geometry"] = gdf["geometry"].buffer(buffer, cap_style=3)
# Import shapefile from geopandas
path_to_data = gpd.datasets.get_path("naturalearth_lowres")
world_shp = gpd.read_file(path_to_data)
# the solution... spatial join buffered polygons to countries
# comma separate associated countries
gdf = gdf.join(
world_shp.sjoin(gdf.set_crs("EPSG:4326"))
.groupby("index_right")["name"]
.agg(",".join)
)
gdf["time_a"] = gdf["time"].dt.strftime("%Y-%b-%d")
# simplest way to test is visualise...
px.choropleth_mapbox(
gdf,
geojson=gdf.geometry,
locations=gdf.index,
color="pdsi",
hover_data=["name"],
animation_frame="time_a",
opacity=.3
).update_layout(
mapbox={"style": "carto-positron", "zoom": 1},
margin={"l": 0, "r": 0, "t": 0, "b": 0},
)
- 使用 https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoSeries.intersection.html 你可以获得与国家多边形相交的部分网格
- 利用面积,可以计算重叠的比例
- 由此我生成了两个可视化
- 显示网格重叠的国家和重叠程度
- 使用加权平均值对国家进行汇总,并计算可用于提高透明度的其他措施
我不知道以这种方式(均值或加权平均值)汇总 PDSI 在数学上/科学上是否合理。这确实演示了如何获得您的问题请求的结果。
# the solution... spatial join buffered polygons to countries
# plus work out overlap between PDSI grid and country. Area of each grid is constant...
gdf_c = (
world_shp.sjoin(gdf.set_crs("EPSG:4326"))
.merge(
gdf.loc[:, "geometry"],
left_on="index_right",
right_index=True,
suffixes=("", "_pdsi"),
)
.assign(
overlap=lambda d: (
d["geometry"]
.intersection(gpd.GeoSeries(d["geometry_pdsi"], crs="EPSG:4326"))
.area
/ (buffer * 2) ** 2
).round(3)
)
)
# comma separate associated countries and a list of overlaps
gdf_pdsi = gdf.loc[:, ["geometry", "time", "pdsi"]].join(
gdf_c.groupby("index_right").agg({"name": ",".join, "overlap": list})
)
gdf_pdsi["time_a"] = gdf_pdsi["time"].dt.strftime("%Y-%b-%d")
# simplest way to test is visualise...
fig_buf = px.choropleth_mapbox(
gdf_pdsi,
geojson=gdf_pdsi.geometry,
locations=gdf_pdsi.index,
color="pdsi",
hover_data=["name", "overlap"],
animation_frame="time_a",
opacity=0.3,
).update_layout(
mapbox={"style": "carto-positron", "zoom": 1},
margin={"l": 0, "r": 0, "t": 0, "b": 0},
)
fig_buf
import pandas as pd
# prepare data to plot by country
df_pdsi = (
gdf_c.groupby(["name", "time"])
.apply(
lambda d: pd.Series(
{
"weighted_pdsi": (d["pdsi"] * d["overlap"]).sum() / d["overlap"].sum(),
"unweighted_pdsi": d["pdsi"].mean(),
"min_pdsi": d["pdsi"].min(),
"max_pdsi": d["pdsi"].max(),
"min_overlap": d["overlap"].min(),
"max_overlap": d["overlap"].max(),
"size_pdsi": len(d["pdsi"]),
# "pdsi_list":[round(v,2) for v in d["pdsi"]]
}
)
)
.reset_index()
)
df_pdsi["time_a"] = df_pdsi["time"].dt.strftime("%Y-%b-%d")
fig = px.choropleth_mapbox(
df_pdsi,
geojson=world_shp.set_index("name").loc[:, "geometry"],
locations="name",
color="weighted_pdsi",
hover_data=df_pdsi.columns,
animation_frame="time_a",
opacity=0.3,
).update_layout(
mapbox={"style": "carto-positron", "zoom": 1},
margin={"l": 0, "r": 0, "t": 0, "b": 0},
)
fig