使用循环的不同长度的不同数据帧中的纬度和经度数据计算距离
Calculate Distance using Latitude and Longitude data in Different Data frames of different lengths with loop
我有2个不同长度的数据框,每个都有一个经纬度坐标。我想通过计算 lat/long 点之间的距离来连接两个数据框。
为简单起见,数据框A(起点)具有以下结构
ID long lat
1 -89.92702 44.19367
2 -89.92525 44.19654
3 -89.92365 44.19756
4 -89.91949 44.19848
5 -89.91359 44.19818
数据框B(端点)结构相似但更短
ID LAT LON
1 43.06519 -87.91446
2 43.14490 -88.07172
3 43.08969 -87.91202
我想计算每个点之间的距离,这样我将以一个合并到 A 的数据框结束,该数据框具有 A1 和 B1、A1 和 B2、A1 和 B3 之间的距离。此外,对于 A$ID 中 A 的所有值以及 B$ID
的所有值,这应该重复
A$ID B$ID
1 1
2 2
3 3
4
5
在发布这篇文章之前,我咨询了几个 Stack Overflow 线程(包括 and This Medium post 但我不确定如何处理循环,尤其是因为列表的长度不同。
谢谢!
这是一个使用两个包的解决方案:sf
和 tidyverse
。第一个用于将数据转换为简单的特征并计算距离;而第二个用于将数据放入所需格式。
library(tidyverse)
library(sf)
# Transform data into simple features
sfA <- st_as_sf(A, coords = c("long","lat"))
sfB <- st_as_sf(B, coords = c("LON","LAT"))
# Calculate distance between all entries of sf1 and sf2
distances <- st_distance(sfA, sfB, by_element = F)
# Set colnames for distances matrix
colnames(distances) <- paste0("B",1:3)
# Put the results in the desired format
# Transform distances matrix into a tibble
as_tibble(distances) %>%
# Get row names and add them as a column
rownames_to_column() %>%
# Set ID as the column name for the row numbers
rename("ID" = "rowname") %>%
# Transform ID to numeric
mutate_at(vars(ID), as.numeric) %>%
# Join with the original A data frame
right_join(A, by = "ID") %>%
# Change the order of columns
select(ID, long, lat, everything()) %>%
# Put data into long format
pivot_longer(cols = starts_with("B"),
names_to = "B_ID",
names_pattern = "B(\d)",
values_to = "distance")
我认为你可以在这里非常简洁地使用 outer
。
library(geosphere)
d <- outer(1:nrow(A), 1:nrow(B), Vectorize(function(x, y) distm(A[x, 2:3], B[y, 3:2])))
cbind(A, `colnames<-`(d, paste0("B", seq(nrow(B)))))
# ID long lat B1 B2 B3
# 1 1 -89.92702 44.19367 205173.6 189641.7 203652.9
# 2 2 -89.92525 44.19654 205252.6 189722.5 203728.1
# 3 3 -89.92365 44.19756 205219.0 189689.8 203692.6
# 4 4 -89.91949 44.19848 205015.6 189488.0 203486.2
# 5 5 -89.91359 44.19818 204620.0 189093.8 203087.6
数据:
A <- read.table(header=T, text="ID long lat
1 -89.92702 44.19367
2 -89.92525 44.19654
3 -89.92365 44.19756
4 -89.91949 44.19848
5 -89.91359 44.19818")
B <- read.table(header=T, text="ID LAT LON
1 43.06519 -87.91446
2 43.14490 -88.07172
3 43.08969 -87.91202")
我有2个不同长度的数据框,每个都有一个经纬度坐标。我想通过计算 lat/long 点之间的距离来连接两个数据框。
为简单起见,数据框A(起点)具有以下结构
ID long lat
1 -89.92702 44.19367
2 -89.92525 44.19654
3 -89.92365 44.19756
4 -89.91949 44.19848
5 -89.91359 44.19818
数据框B(端点)结构相似但更短
ID LAT LON
1 43.06519 -87.91446
2 43.14490 -88.07172
3 43.08969 -87.91202
我想计算每个点之间的距离,这样我将以一个合并到 A 的数据框结束,该数据框具有 A1 和 B1、A1 和 B2、A1 和 B3 之间的距离。此外,对于 A$ID 中 A 的所有值以及 B$ID
的所有值,这应该重复A$ID B$ID
1 1
2 2
3 3
4
5
在发布这篇文章之前,我咨询了几个 Stack Overflow 线程(包括
谢谢!
这是一个使用两个包的解决方案:sf
和 tidyverse
。第一个用于将数据转换为简单的特征并计算距离;而第二个用于将数据放入所需格式。
library(tidyverse)
library(sf)
# Transform data into simple features
sfA <- st_as_sf(A, coords = c("long","lat"))
sfB <- st_as_sf(B, coords = c("LON","LAT"))
# Calculate distance between all entries of sf1 and sf2
distances <- st_distance(sfA, sfB, by_element = F)
# Set colnames for distances matrix
colnames(distances) <- paste0("B",1:3)
# Put the results in the desired format
# Transform distances matrix into a tibble
as_tibble(distances) %>%
# Get row names and add them as a column
rownames_to_column() %>%
# Set ID as the column name for the row numbers
rename("ID" = "rowname") %>%
# Transform ID to numeric
mutate_at(vars(ID), as.numeric) %>%
# Join with the original A data frame
right_join(A, by = "ID") %>%
# Change the order of columns
select(ID, long, lat, everything()) %>%
# Put data into long format
pivot_longer(cols = starts_with("B"),
names_to = "B_ID",
names_pattern = "B(\d)",
values_to = "distance")
我认为你可以在这里非常简洁地使用 outer
。
library(geosphere)
d <- outer(1:nrow(A), 1:nrow(B), Vectorize(function(x, y) distm(A[x, 2:3], B[y, 3:2])))
cbind(A, `colnames<-`(d, paste0("B", seq(nrow(B)))))
# ID long lat B1 B2 B3
# 1 1 -89.92702 44.19367 205173.6 189641.7 203652.9
# 2 2 -89.92525 44.19654 205252.6 189722.5 203728.1
# 3 3 -89.92365 44.19756 205219.0 189689.8 203692.6
# 4 4 -89.91949 44.19848 205015.6 189488.0 203486.2
# 5 5 -89.91359 44.19818 204620.0 189093.8 203087.6
数据:
A <- read.table(header=T, text="ID long lat
1 -89.92702 44.19367
2 -89.92525 44.19654
3 -89.92365 44.19756
4 -89.91949 44.19848
5 -89.91359 44.19818")
B <- read.table(header=T, text="ID LAT LON
1 43.06519 -87.91446
2 43.14490 -88.07172
3 43.08969 -87.91202")