R:点overlap/within距离时追加数据;将缓冲区矩形添加到 set1,将半径添加到 set2
R: Append data when points overlap/within distance; add buffer rectangle to set1, add radius to set2
我有一个关于在海洋中移动的涡流的数据库 (netcdf),以及一些做同样事情的鱼的足迹。当鱼在涡流范围内时(在(变化的)阈值距离内的位置和相同的日期),我试图将涡流信息附加到鱼道。数据结构、并发症和可重现的例子如下。我怀疑这不是特别难,我只是不断地纠缠自己试图找出一个巧妙的解决方案。提前感谢您的任何想法!
数据结构/可重现示例:3 同一天的鱼和涡流轨迹
library(lubridate)
fish <- data.frame(lat = c(42.1, 42.6, 43.2), lon = c(-10, -10.1, -10.2), date = ymd(c("1990-01-01", "1990-01-02", "1990-01-02")))
eddy <- data.frame(lat = c(44, 42.3, 40), lon = c(-15, -10.1, -6), date = ymd(c("1990-01-01", "1990-01-02", "1990-01-02")), track = c(1,1,1), radius = c(81,82,83), vals = c(11,12,13))
这里有点棘手:
- 变半径圆涡流缓冲器
涡流的大小(从中计算重叠或距离)由半径列给出,因此:
sf_eddy <- st_as_sf(eddy)
st_buffer(sf_eddy, dist = radius)
对于 st_buffer "All operations work on a per-feature basis" 所以我不确定是否会将 sf_eddy 视为单个特征(可能是(多)线串)或 nrows of separate points。我想我可以玩弄它直到我让它工作,这样每个涡流日 (n=151937) 都是一个 sf 缓冲圈,如果不包括它们,我大概可以将其附加到 vals
已经.
- 设置距离latlong矩形鱼缓冲区
鱼群位置的固定误差距离为+-1.9°lat,+-0.77°lon。所以从概念上讲,我想在每天的鱼位周围创建一个又高又细的矩形多边形。 st_buffer
看起来不允许这样做但是 there may be other options, potentially by creating a polygon around each centroid as per the square
example here.
- 如果鱼和任何涡流在 space 时间重叠,将涡流附加到鱼
当 eddy$date==fish$date?虽然涡流计算是一次性的,但我可以保存输出。我怀疑鱼箱计算也不会特别费力。不管怎样:
fish$vals <- rep(NA, nrow(fish))
sf_fish <- sf_as_sf(fish)
for (i in seq(along = sf_fish)){
if(st_intersects(sf_fish[i,],sf_eddy)) sf_fish[i,"vals"] <- sf_eddy$vals}
显然,这可能是一种糟糕的做法。
但这是否是解决此问题的一种模糊合理的方法,或者由于我对 sf、sp、rgeos 等的业余知识,我是否错过了更优雅的解决方案?我有 166 个鱼文件,平均 286 天的位置 = 47476 次总查找。是否会 good/fast 创建 fish/eddy 日期匹配对的索引(同一天可能有多个漩涡),然后只测试那些的交叉点?由于我需要将 eddy$vals
(实际上是 2 或 3 列)附加到 fish
,使用空间连接会更好吗,也许 st_join
?
编辑:评论和输出地图奇怪之处:
总共 600,000 fish/eddy 行 ggplot 输出:
看来我需要修复投影,因为涡流没有打印在 0 以西。它们现在肯定在同一个 CRS 中;也许原件是 0-360 而不是 -180:180。此外,虽然鱼盒打印得很好,但涡流太小:平均 r=80km=~1deg@45N,而且它们打印出来的肯定比那个小得多。我也会考虑一下。另外:如果同一天存在 2 个涡流,for() 会有问题吗?
Edit2:原件是 0:360,我已更正。涡流半径仍然太小:
根据代码讨论,带有 latlons 的原始 eddy 文件使用 st_as_sf
转换为 crs4326 (wgs84),然后 st_transform(sf_eddy,6931)
根据上图转换为北大西洋。结果文件中的几何列是 POLYGON,XY 维度,值在投影坐标中而不是 latlons,例如sf_eddy_buffered$geometry[1]
bbox
: xmin -4894790 ymin -2418555 xmax -4894648 ymax -2418413
。 speed_radius
是 71,所以 xmin 和 xmax 之间的差是 142,即 71*2,这是正确的。半径单位不等于投影距离单位吗? Stewart 的示例是 81000。涡流半径单位:km。投影/几何单位:https://epsg.io/6931 表示单位确实是米。将半径乘以 1000 并重试:
所以这一切都很好。刚刚尝试了更新的 st_join
代码,认为它可以使用一些工作 - 可能绝对永远需要考虑 3639209 涡流行与 461 鱼行。它还在同一天为同一条鱼的多个涡流创建了多行。鉴于鱼周围的错误缓冲区框,可能是相对较大重叠机会的函数。
我怀疑我可以通过标记重复项并删除最远的重复项来解决这个问题。目前的做法(通过 v 和 rbind)比以前的方式(直接单元格赋值)IMO 更好,因为否则多个值将被发送到单个单元格,要么崩溃它要么静默覆盖。
为了加快代码速度,我想我可以将 eddy 文件子集化为仅当前鱼日期范围的重叠部分,并在连接循环中使用该子集。无论如何,连接循环已经按日期对鱼和涡流文件进行了子集化,所以这可能是多余的,或者减少需要处理的涡流数据的大小会减少 CPU/RAM 每次的工作量?
Edit3:这大大加快了速度。为 thisDate
添加一个简单的循环计数器显示它在 461 行中的 275 行完成。并制作了一个481长度的文件。可能有很多天鱼不在漩涡中(thisDate
循环结果没有 v
也没有 rbind
),并且有很多天鱼在漩涡中2+ 涡流(thisDate
循环导致多行 v
和 rbind
)。仍然感觉它应该处理 461 个循环,但是...
Edit4:一个小调整:
v <- st_join(thisFishRow, thisEddyRow, left = F, largest = TRUE)
largest
意味着它只会加入具有最大重叠区域的涡流,导致最大 v
rowlength 为 1,因此每天最多 1 个值。然而,路径的细微差别意味着鱼似乎可以在不同的持续漩涡之间来回穿梭。我的怀疑(基于类似的研究)是它们更有可能停留在强大的反气旋涡流系统中。涡流控制的半径(以及面积)可能比位置重叠更多。对于 1.5 个月的部分,我发现重叠度最高的涡流轨道编号在 3 个涡流之间反弹。涡流轨道编号 217008 是最强大的反气旋。这是一个(废话)情节:
如您所见,217008 正好位于鱼误差框的中心,而更大、更分散、功率更低的涡流 217343 和 39625 则位于边缘。然而,由于它们有更多的面积可以提供,并且不计算质心接近度,因此它们的较大尺寸经常会看到它们被撞到顶部。
所以:如果 fishbox 在同一天与 eddy 重叠,则将 eddy 包括在候选名单中
(thisFishRow
& thisEddyRow
保持不变)。然后:根据与质心的最短距离(而不是 st_join
)从候选名单中选择涡流。待续!
编辑5:
fishNearEddies <- NULL
for (thisDate in sf_df_nona$date) {
thisFishRow <- sf_df_nona[sf_df_nona$date == thisDate, ]
thisEddyRow <- sebDateSub[sebDateSub$date == thisDate, ]
overlapToday <- st_join(thisFishRow, thisEddyRow, left = F)
if (nrow(overlapToday) > 0) {
overlapEddies <- thisEddyRow[thisEddyRow$track %in% overlapToday$track,]
st_distance(x = thisFishRow, y = overlapEddies, by_element = TRUE) #0 0 due to overlap min dist = 0
} #close if
} #close for
st_distance 将不起作用,因为空间特征(缓冲区框和缓冲区圆)重叠,所以最小距离 = 0。我想我还需要质心来测试,我想?
Edit6:使用工作代码进行最终编辑,并感谢 Stewart 的所有帮助。再次感谢先生。
# overlap join loop####
fishNearEddies <- NULL
EdCentroidDist <- NULL
counter <- 1
ofhowmany <- length(sf_df_nona$date)
for (thisDate in sf_df_nona$date) {
thisFishRow <- sf_df_nona[sf_df_nona$date == thisDate, ] #will be 1 row per day
thisEddyRow <- sebDateSub[sebDateSub$date == thisDate, ] #all eddies that day, could be multi rows
overlapToday <- st_join(thisFishRow, thisEddyRow, left = F) #will join only if they spatial intersect and (already) time match
if (nrow(overlapToday) > 0) {
# now need to join based on closest centroid and NOT on highest overlap
overlapEddies <- thisEddyRow[thisEddyRow$track %in% overlapToday$track,] #subset TER by overlap tracks
fishcentroid <- st_centroid(thisFishRow)
eddycentroid <- st_centroid(overlapEddies)
fishEdDists <- st_distance(x = fishcentroid, y = eddycentroid, by_element = TRUE) #result vector corresponding to overlapEddies rownumbers
fishNearEddies <- rbind(fishNearEddies, overlapEddies[which.min(fishEdDists),]) #row index for overlapEddies, no index but has date
EdCentroidDist <- rbind(EdCentroidDist, as.numeric(min(fishEdDists)))
} #close if
print(paste0(counter, " of ", ofhowmany, " fish days"))
counter <- counter + 1
} #close for
if (!is.null(fishNearEddies)) { #if there are overlaps, do processing. If not it'll fail
fishNearEddies %<>% as.data.frame() %>% # convert to nonspatial so I can remove buffer column
select(-geometry) %>% # remove geometry column. Attributes still remain. Whatever?
cbind(EdCentroidDist) #add to FNE as column
这里有一些东西可以让你继续:
library(lubridate)
library(sf)
library(ggplot2)
# sample data
fish <- data.frame(
lat = c(41.1, 43.6, 44.2),
lon = c(-7, -11, -15),
date = ymd(c("1990-01-01", "1990-01-02", "1990-01-03"))
)
# Add a blank column for the eddy values we want
fish$vals <- rep(NA, nrow(fish))
eddy <- data.frame(
lat = c(44, 42.3, 40),
lon = c(-6, -10.1, -15),
date = ymd(c("1990-01-01", "1990-01-02", "1990-01-03")),
track = c(1,1,1),
radius = c(81000,82000,83000),
vals = c(11,12,13)
)
# Convert eddy to simple features, using WGS84 CRS
sf_eddy <- st_as_sf(eddy, coords = c("lon", "lat"), crs=4326)
# Transform from geographical to projected so we can buffer correctly. You might need to pick a different CRS
sf_eddy <- st_transform(sf_eddy, 3035) # ETRS89 / LAEA Europe
# Buffer the eddy points based on their radii
sf_eddy_buffered <- st_buffer(sf_eddy, dist = sf_eddy$radius)
# Add error to fish position. There's probably a better way to do this.
fishErrLat <- 1.9
fishErrLon <- 0.77
fish$buffer <- paste('POLYGON((',
fish$lon - fishErrLon, ' ', fish$lat + fishErrLat, ',',
fish$lon + fishErrLon, ' ', fish$lat + fishErrLat, ',',
fish$lon + fishErrLon, ' ', fish$lat - fishErrLat, ',',
fish$lon - fishErrLon, ' ', fish$lat - fishErrLat, ',',
fish$lon - fishErrLon, ' ', fish$lat + fishErrLat,
'))',
sep=''
)
# Convert fish to simple features, using WGS84 CRS
sf_fish <- st_as_sf(fish, wkt='buffer', crs=4326)
# Transform from geographical to projected
sf_fish <- st_transform(sf_fish, 3035)
#Plot what we've got so far
g <- ggplot() +
geom_sf(data=sf_eddy_buffered, aes(fill=date)) +
geom_sf(data=sf_fish, aes(fill=date))
print(g)
# Check for overlap
fishNearEddies <- NULL
for (thisDate in unique(sf_fish$date)) {
thisFishRow <- sf_fish[sf_fish$date==thisDate, ]
thisEddyRow <- sf_eddy_buffered[sf_eddy_buffered$date==thisDate, ]
v <- st_join(thisFishRow, thisEddyRow, left=F)
if (nrow(v) > 0) {
fishNearEddies <- rbind(fishNearEddies, v)
}
}
并检查结果:
> fishNearEddies
Simple feature collection with 1 feature and 7 fields
geometry type: POLYGON
dimension: XY
bbox: xmin: 2588691 ymin: 2402830 xmax: 2703342 ymax: 2624501
epsg (SRID): 3035
proj4string: +proj=laea +lat_0=52 +lon_0=10 +x_0=4321000 +y_0=3210000 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs
lat lon date.x date.y track radius vals buffer
2 43.6 -11 1990-01-02 1990-01-02 1 82000 12 POLYGON ((2644792 2624501, ...
这只会给你那些与涡流相交的鱼。
我有一个关于在海洋中移动的涡流的数据库 (netcdf),以及一些做同样事情的鱼的足迹。当鱼在涡流范围内时(在(变化的)阈值距离内的位置和相同的日期),我试图将涡流信息附加到鱼道。数据结构、并发症和可重现的例子如下。我怀疑这不是特别难,我只是不断地纠缠自己试图找出一个巧妙的解决方案。提前感谢您的任何想法!
数据结构/可重现示例:3 同一天的鱼和涡流轨迹
library(lubridate)
fish <- data.frame(lat = c(42.1, 42.6, 43.2), lon = c(-10, -10.1, -10.2), date = ymd(c("1990-01-01", "1990-01-02", "1990-01-02")))
eddy <- data.frame(lat = c(44, 42.3, 40), lon = c(-15, -10.1, -6), date = ymd(c("1990-01-01", "1990-01-02", "1990-01-02")), track = c(1,1,1), radius = c(81,82,83), vals = c(11,12,13))
这里有点棘手:
- 变半径圆涡流缓冲器
涡流的大小(从中计算重叠或距离)由半径列给出,因此:
sf_eddy <- st_as_sf(eddy)
st_buffer(sf_eddy, dist = radius)
对于 st_buffer "All operations work on a per-feature basis" 所以我不确定是否会将 sf_eddy 视为单个特征(可能是(多)线串)或 nrows of separate points。我想我可以玩弄它直到我让它工作,这样每个涡流日 (n=151937) 都是一个 sf 缓冲圈,如果不包括它们,我大概可以将其附加到 vals
已经.
- 设置距离latlong矩形鱼缓冲区
鱼群位置的固定误差距离为+-1.9°lat,+-0.77°lon。所以从概念上讲,我想在每天的鱼位周围创建一个又高又细的矩形多边形。 st_buffer
看起来不允许这样做但是 there may be other options, potentially by creating a polygon around each centroid as per the square
example here.
- 如果鱼和任何涡流在 space 时间重叠,将涡流附加到鱼
当 eddy$date==fish$date?虽然涡流计算是一次性的,但我可以保存输出。我怀疑鱼箱计算也不会特别费力。不管怎样:
fish$vals <- rep(NA, nrow(fish))
sf_fish <- sf_as_sf(fish)
for (i in seq(along = sf_fish)){
if(st_intersects(sf_fish[i,],sf_eddy)) sf_fish[i,"vals"] <- sf_eddy$vals}
显然,这可能是一种糟糕的做法。
但这是否是解决此问题的一种模糊合理的方法,或者由于我对 sf、sp、rgeos 等的业余知识,我是否错过了更优雅的解决方案?我有 166 个鱼文件,平均 286 天的位置 = 47476 次总查找。是否会 good/fast 创建 fish/eddy 日期匹配对的索引(同一天可能有多个漩涡),然后只测试那些的交叉点?由于我需要将 eddy$vals
(实际上是 2 或 3 列)附加到 fish
,使用空间连接会更好吗,也许 st_join
?
编辑:评论和输出地图奇怪之处: 总共 600,000 fish/eddy 行 ggplot 输出:
看来我需要修复投影,因为涡流没有打印在 0 以西。它们现在肯定在同一个 CRS 中;也许原件是 0-360 而不是 -180:180。此外,虽然鱼盒打印得很好,但涡流太小:平均 r=80km=~1deg@45N,而且它们打印出来的肯定比那个小得多。我也会考虑一下。另外:如果同一天存在 2 个涡流,for() 会有问题吗?
Edit2:原件是 0:360,我已更正。涡流半径仍然太小:
st_as_sf
转换为 crs4326 (wgs84),然后 st_transform(sf_eddy,6931)
根据上图转换为北大西洋。结果文件中的几何列是 POLYGON,XY 维度,值在投影坐标中而不是 latlons,例如sf_eddy_buffered$geometry[1]
bbox
: xmin -4894790 ymin -2418555 xmax -4894648 ymax -2418413
。 speed_radius
是 71,所以 xmin 和 xmax 之间的差是 142,即 71*2,这是正确的。半径单位不等于投影距离单位吗? Stewart 的示例是 81000。涡流半径单位:km。投影/几何单位:https://epsg.io/6931 表示单位确实是米。将半径乘以 1000 并重试:
st_join
代码,认为它可以使用一些工作 - 可能绝对永远需要考虑 3639209 涡流行与 461 鱼行。它还在同一天为同一条鱼的多个涡流创建了多行。鉴于鱼周围的错误缓冲区框,可能是相对较大重叠机会的函数。
我怀疑我可以通过标记重复项并删除最远的重复项来解决这个问题。目前的做法(通过 v 和 rbind)比以前的方式(直接单元格赋值)IMO 更好,因为否则多个值将被发送到单个单元格,要么崩溃它要么静默覆盖。
为了加快代码速度,我想我可以将 eddy 文件子集化为仅当前鱼日期范围的重叠部分,并在连接循环中使用该子集。无论如何,连接循环已经按日期对鱼和涡流文件进行了子集化,所以这可能是多余的,或者减少需要处理的涡流数据的大小会减少 CPU/RAM 每次的工作量?
Edit3:这大大加快了速度。为 thisDate
添加一个简单的循环计数器显示它在 461 行中的 275 行完成。并制作了一个481长度的文件。可能有很多天鱼不在漩涡中(thisDate
循环结果没有 v
也没有 rbind
),并且有很多天鱼在漩涡中2+ 涡流(thisDate
循环导致多行 v
和 rbind
)。仍然感觉它应该处理 461 个循环,但是...
Edit4:一个小调整:
v <- st_join(thisFishRow, thisEddyRow, left = F, largest = TRUE)
largest
意味着它只会加入具有最大重叠区域的涡流,导致最大 v
rowlength 为 1,因此每天最多 1 个值。然而,路径的细微差别意味着鱼似乎可以在不同的持续漩涡之间来回穿梭。我的怀疑(基于类似的研究)是它们更有可能停留在强大的反气旋涡流系统中。涡流控制的半径(以及面积)可能比位置重叠更多。对于 1.5 个月的部分,我发现重叠度最高的涡流轨道编号在 3 个涡流之间反弹。涡流轨道编号 217008 是最强大的反气旋。这是一个(废话)情节:
thisFishRow
& thisEddyRow
保持不变)。然后:根据与质心的最短距离(而不是 st_join
)从候选名单中选择涡流。待续!
编辑5:
fishNearEddies <- NULL
for (thisDate in sf_df_nona$date) {
thisFishRow <- sf_df_nona[sf_df_nona$date == thisDate, ]
thisEddyRow <- sebDateSub[sebDateSub$date == thisDate, ]
overlapToday <- st_join(thisFishRow, thisEddyRow, left = F)
if (nrow(overlapToday) > 0) {
overlapEddies <- thisEddyRow[thisEddyRow$track %in% overlapToday$track,]
st_distance(x = thisFishRow, y = overlapEddies, by_element = TRUE) #0 0 due to overlap min dist = 0
} #close if
} #close for
st_distance 将不起作用,因为空间特征(缓冲区框和缓冲区圆)重叠,所以最小距离 = 0。我想我还需要质心来测试,我想?
Edit6:使用工作代码进行最终编辑,并感谢 Stewart 的所有帮助。再次感谢先生。
# overlap join loop####
fishNearEddies <- NULL
EdCentroidDist <- NULL
counter <- 1
ofhowmany <- length(sf_df_nona$date)
for (thisDate in sf_df_nona$date) {
thisFishRow <- sf_df_nona[sf_df_nona$date == thisDate, ] #will be 1 row per day
thisEddyRow <- sebDateSub[sebDateSub$date == thisDate, ] #all eddies that day, could be multi rows
overlapToday <- st_join(thisFishRow, thisEddyRow, left = F) #will join only if they spatial intersect and (already) time match
if (nrow(overlapToday) > 0) {
# now need to join based on closest centroid and NOT on highest overlap
overlapEddies <- thisEddyRow[thisEddyRow$track %in% overlapToday$track,] #subset TER by overlap tracks
fishcentroid <- st_centroid(thisFishRow)
eddycentroid <- st_centroid(overlapEddies)
fishEdDists <- st_distance(x = fishcentroid, y = eddycentroid, by_element = TRUE) #result vector corresponding to overlapEddies rownumbers
fishNearEddies <- rbind(fishNearEddies, overlapEddies[which.min(fishEdDists),]) #row index for overlapEddies, no index but has date
EdCentroidDist <- rbind(EdCentroidDist, as.numeric(min(fishEdDists)))
} #close if
print(paste0(counter, " of ", ofhowmany, " fish days"))
counter <- counter + 1
} #close for
if (!is.null(fishNearEddies)) { #if there are overlaps, do processing. If not it'll fail
fishNearEddies %<>% as.data.frame() %>% # convert to nonspatial so I can remove buffer column
select(-geometry) %>% # remove geometry column. Attributes still remain. Whatever?
cbind(EdCentroidDist) #add to FNE as column
这里有一些东西可以让你继续:
library(lubridate)
library(sf)
library(ggplot2)
# sample data
fish <- data.frame(
lat = c(41.1, 43.6, 44.2),
lon = c(-7, -11, -15),
date = ymd(c("1990-01-01", "1990-01-02", "1990-01-03"))
)
# Add a blank column for the eddy values we want
fish$vals <- rep(NA, nrow(fish))
eddy <- data.frame(
lat = c(44, 42.3, 40),
lon = c(-6, -10.1, -15),
date = ymd(c("1990-01-01", "1990-01-02", "1990-01-03")),
track = c(1,1,1),
radius = c(81000,82000,83000),
vals = c(11,12,13)
)
# Convert eddy to simple features, using WGS84 CRS
sf_eddy <- st_as_sf(eddy, coords = c("lon", "lat"), crs=4326)
# Transform from geographical to projected so we can buffer correctly. You might need to pick a different CRS
sf_eddy <- st_transform(sf_eddy, 3035) # ETRS89 / LAEA Europe
# Buffer the eddy points based on their radii
sf_eddy_buffered <- st_buffer(sf_eddy, dist = sf_eddy$radius)
# Add error to fish position. There's probably a better way to do this.
fishErrLat <- 1.9
fishErrLon <- 0.77
fish$buffer <- paste('POLYGON((',
fish$lon - fishErrLon, ' ', fish$lat + fishErrLat, ',',
fish$lon + fishErrLon, ' ', fish$lat + fishErrLat, ',',
fish$lon + fishErrLon, ' ', fish$lat - fishErrLat, ',',
fish$lon - fishErrLon, ' ', fish$lat - fishErrLat, ',',
fish$lon - fishErrLon, ' ', fish$lat + fishErrLat,
'))',
sep=''
)
# Convert fish to simple features, using WGS84 CRS
sf_fish <- st_as_sf(fish, wkt='buffer', crs=4326)
# Transform from geographical to projected
sf_fish <- st_transform(sf_fish, 3035)
#Plot what we've got so far
g <- ggplot() +
geom_sf(data=sf_eddy_buffered, aes(fill=date)) +
geom_sf(data=sf_fish, aes(fill=date))
print(g)
# Check for overlap
fishNearEddies <- NULL
for (thisDate in unique(sf_fish$date)) {
thisFishRow <- sf_fish[sf_fish$date==thisDate, ]
thisEddyRow <- sf_eddy_buffered[sf_eddy_buffered$date==thisDate, ]
v <- st_join(thisFishRow, thisEddyRow, left=F)
if (nrow(v) > 0) {
fishNearEddies <- rbind(fishNearEddies, v)
}
}
并检查结果:
> fishNearEddies
Simple feature collection with 1 feature and 7 fields
geometry type: POLYGON
dimension: XY
bbox: xmin: 2588691 ymin: 2402830 xmax: 2703342 ymax: 2624501
epsg (SRID): 3035
proj4string: +proj=laea +lat_0=52 +lon_0=10 +x_0=4321000 +y_0=3210000 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs
lat lon date.x date.y track radius vals buffer
2 43.6 -11 1990-01-02 1990-01-02 1 82000 12 POLYGON ((2644792 2624501, ...
这只会给你那些与涡流相交的鱼。