在 R 代码中为路段(纬度经度)分配 ID
Assign ID for Road segment ( Latitude longitude) in R Code
我希望得到您在处理以下数据集方面的想法和建议:
Start_Latitude Start_Longitude End_Latitude End_Longitude Date Avg_Speed
41.92446 -87.68654 41.93184 -87.67459 2020-06-11 6:00 40
41.90367 -87.63233 41.91600 -87.61911 2020-06-11 6:00 35
41.86468 -87.76746 41.82341 -87.69162 2020-06-11 6:00 54
41.96075 -87.74756 41.76543 -87.67459 2020-06-11 6:00 45
我有代表路段的变量:Start_Latitude、Start_Longitude和End_Latitude、End_Longitude,我有每个路段的平均速度。
我想为以纬度和经度开始,以纬度和经度结束的每个路段分配 Id,以便我可以将平均速度与另一个路段进行比较。
我想要的数据如下所示:
St_Lat_Long End_Lat_Long Date Avg_Speed ID
41.92446, -87.6865 41.93184,-87.67459 2020-06-11 6:00 40 1
41.90367,-87.63233 41.91600,-87.61911 2020-06-11 6:00 35 2
41.86468,-87.76746 41.82341,-87.69162 2020-06-11 6:00 54 3
41.96075,-87.74756 41.76543,-87.67459 2020-06-11 6:00 45 4
如何在 R 代码中分配 Id?
我有以下代码为一个空间点分配 ID,该空间点具有 Start_Latitude、Start_Longitude(2 个坐标:
Data$ID <- cumsum(!duplicated(df[1:2]))
Latitude Longitude Date Avg_Speed ID
41.92446 -87.68654 2020-06-11 6:00 40 1
41.90367 -87.63233 2020-06-11 6:00 35 2
41.86468 -87.76746 2020-06-11 6:00 54 3
41.96075 -87.74756 2020-06-11 6:00 45 4
另外,是否可以使用4个坐标在地图上绘制所有路段。
这是一种使用 base
:
解决 post 第一部分的方法
数据
foo <- tibble::tribble(~Start_Latitude, ~Start_Longitude, ~End_Latitude, ~End_Longitude, ~Date, ~Avg_Speed,
41.92446, -87.68654, 41.93184, -87.67459, '2020-06-11 6:00', 40,
41.90367, -87.63233, 41.91600, -87.61911, '2020-06-11 6:00', 35,
41.86468, -87.76746, 41.82341, -87.69162, '2020-06-11 6:00', 54,
41.96075, -87.74756, 41.76543, -87.67459, '2020-06-11 6:00', 45)
代码
foo$St_lat_long = paste(foo$Start_Latitude, foo$Start_Longitude, sep = ", ")
foo$End_lat_long = paste(foo$End_Latitude, foo$End_Longitude, sep = ", ")
foo2 <- foo[,c(7,8,5,6)]
foo2$ID <- seq.int(nrow(foo2))
输出
St_lat_long End_lat_long Date Avg_Speed ID
41.92446, -87.68654 41.93184, -87.67459 2020-06-11 6:00 40 1
41.90367, -87.63233 41.916, -87.61911 2020-06-11 6:00 35 2
41.86468, -87.76746 41.82341, -87.69162 2020-06-11 6:00 54 3
41.96075, -87.74756 41.76543, -87.67459 2020-06-11 6:00 45 4
映射数据
您在 post 中提供了以下数据:
foo <- tibble::tribble(~Latitude, ~Longitude, ~Date, ~Avg_Speed, ~ID,
41.92446, -87.68654, "2020-06-11 6:00", 40, 1, 41.90367,
-87.63233, "2020-06-11 6:00", 35, 2, 41.86468, -87.76746,
"2020-06-11 6:00", 54, 3, 41.96075, -87.74756, "2020-06-11 6:00",
45, 4)
#> # A tibble: 4 x 5
#> Latitude Longitude Date Avg_Speed ID
#> <dbl> <dbl> <chr> <dbl> <dbl>
#> 1 41.9 -87.7 2020-06-11 6:00 40 1
#> 2 41.9 -87.6 2020-06-11 6:00 35 2
#> 3 41.9 -87.8 2020-06-11 6:00 54 3
#> 4 42.0 -87.7 2020-06-11 6:00 45 4
由 reprex package (v0.3.0)
于 2020-06-21 创建
创建带有纬度和经度的地图
这是一种使用 leaflet
包映射问题最后部分的方法:
library(leaflet) %>%
leaflet(foo) %>%
addTiles() %>%
addCircleMarkers(lat = ~Latitude,
lng = ~Longitude,
popup = paste("<b>Date:</b>", foo$Date, "<br>",
"<b>Avergae Speed:</b>", foo$Avg_Speed, "<br>",
"<b>ID:</b>", foo$ID, "<br>"))
由 reprex package (v0.3.0)
于 2020-06-21 创建
输出
我将交互式传单发布到我的 RPub。 Here is a link
一种使用dplyr
和stringr
的方式:
df %>%
mutate(St_Lat_Long = str_c(Start_Latitude, ", ", Start_Longitude),
End_Lat_Long = str_c(End_Latitude, ", ", End_Longitude)) %>%
select(St_Lat_Long, End_Lat_Long, Date, Avg_Speed) %>%
distinct(across(ends_with("Lat_Long")), .keep_all=TRUE) %>%
mutate(ID = row_number())
产量
# A tibble: 4 x 5
St_Lat_Long End_Lat_Long Date Avg_Speed ID
<chr> <chr> <chr> <dbl> <int>
1 41.92446, -87.68654 41.93184, -87.67459 2020-06-11 6:00 40 1
2 41.90367, -87.63233 41.916, -87.61911 2020-06-11 6:00 35 2
3 41.86468, -87.76746 41.82341, -87.69162 2020-06-11 6:00 54 3
4 41.96075, -87.74756 41.76543, -87.67459 2020-06-11 6:00 45 4
您可以使用 unite
、paste
合并列,并使用 match
和 unique
创建唯一的 ID
。
library(dplyr)
library(tidyr)
df %>%
unite(St_Lat_Long, Start_Latitude, Start_Longitude, sep = ',') %>%
unite(End_Lat_Long, End_Latitude, End_Longitude, sep = ',') %>%
mutate(temp = paste(St_Lat_Long,End_Lat_Long),
ID = match(temp, unique(temp))) %>%
select(-temp)
# A tibble: 4 x 5
# St_Lat_Long End_Lat_Long Date Avg_Speed ID
# <chr> <chr> <chr> <dbl> <int>
#1 41.92446,-87.68654 41.93184,-87.67459 2020-06-11 6:00 40 1
#2 41.90367,-87.63233 41.916,-87.61911 2020-06-11 6:00 35 2
#3 41.86468,-87.76746 41.82341,-87.69162 2020-06-11 6:00 54 3
#4 41.96075,-87.74756 41.76543,-87.67459 2020-06-11 6:00 45 4
在新的 dplyr 1.0.0
中,您可以使用 cur_group_id
为每个组分配一个唯一的编号。
df %>%
unite(St_Lat_Long, Start_Latitude, Start_Longitude, sep = ',') %>%
unite(End_Lat_Long, End_Latitude, End_Longitude, sep = ',') %>%
group_by(St_Lat_Long, End_Lat_Long) %>%
mutate(ID = cur_group_id())
我希望得到您在处理以下数据集方面的想法和建议:
Start_Latitude Start_Longitude End_Latitude End_Longitude Date Avg_Speed
41.92446 -87.68654 41.93184 -87.67459 2020-06-11 6:00 40
41.90367 -87.63233 41.91600 -87.61911 2020-06-11 6:00 35
41.86468 -87.76746 41.82341 -87.69162 2020-06-11 6:00 54
41.96075 -87.74756 41.76543 -87.67459 2020-06-11 6:00 45
我有代表路段的变量:Start_Latitude、Start_Longitude和End_Latitude、End_Longitude,我有每个路段的平均速度。
我想为以纬度和经度开始,以纬度和经度结束的每个路段分配 Id,以便我可以将平均速度与另一个路段进行比较。
我想要的数据如下所示:
St_Lat_Long End_Lat_Long Date Avg_Speed ID
41.92446, -87.6865 41.93184,-87.67459 2020-06-11 6:00 40 1
41.90367,-87.63233 41.91600,-87.61911 2020-06-11 6:00 35 2
41.86468,-87.76746 41.82341,-87.69162 2020-06-11 6:00 54 3
41.96075,-87.74756 41.76543,-87.67459 2020-06-11 6:00 45 4
如何在 R 代码中分配 Id? 我有以下代码为一个空间点分配 ID,该空间点具有 Start_Latitude、Start_Longitude(2 个坐标:
Data$ID <- cumsum(!duplicated(df[1:2]))
Latitude Longitude Date Avg_Speed ID
41.92446 -87.68654 2020-06-11 6:00 40 1
41.90367 -87.63233 2020-06-11 6:00 35 2
41.86468 -87.76746 2020-06-11 6:00 54 3
41.96075 -87.74756 2020-06-11 6:00 45 4
另外,是否可以使用4个坐标在地图上绘制所有路段。
这是一种使用 base
:
数据
foo <- tibble::tribble(~Start_Latitude, ~Start_Longitude, ~End_Latitude, ~End_Longitude, ~Date, ~Avg_Speed,
41.92446, -87.68654, 41.93184, -87.67459, '2020-06-11 6:00', 40,
41.90367, -87.63233, 41.91600, -87.61911, '2020-06-11 6:00', 35,
41.86468, -87.76746, 41.82341, -87.69162, '2020-06-11 6:00', 54,
41.96075, -87.74756, 41.76543, -87.67459, '2020-06-11 6:00', 45)
代码
foo$St_lat_long = paste(foo$Start_Latitude, foo$Start_Longitude, sep = ", ")
foo$End_lat_long = paste(foo$End_Latitude, foo$End_Longitude, sep = ", ")
foo2 <- foo[,c(7,8,5,6)]
foo2$ID <- seq.int(nrow(foo2))
输出
St_lat_long End_lat_long Date Avg_Speed ID
41.92446, -87.68654 41.93184, -87.67459 2020-06-11 6:00 40 1
41.90367, -87.63233 41.916, -87.61911 2020-06-11 6:00 35 2
41.86468, -87.76746 41.82341, -87.69162 2020-06-11 6:00 54 3
41.96075, -87.74756 41.76543, -87.67459 2020-06-11 6:00 45 4
映射数据
您在 post 中提供了以下数据:
foo <- tibble::tribble(~Latitude, ~Longitude, ~Date, ~Avg_Speed, ~ID,
41.92446, -87.68654, "2020-06-11 6:00", 40, 1, 41.90367,
-87.63233, "2020-06-11 6:00", 35, 2, 41.86468, -87.76746,
"2020-06-11 6:00", 54, 3, 41.96075, -87.74756, "2020-06-11 6:00",
45, 4)
#> # A tibble: 4 x 5
#> Latitude Longitude Date Avg_Speed ID
#> <dbl> <dbl> <chr> <dbl> <dbl>
#> 1 41.9 -87.7 2020-06-11 6:00 40 1
#> 2 41.9 -87.6 2020-06-11 6:00 35 2
#> 3 41.9 -87.8 2020-06-11 6:00 54 3
#> 4 42.0 -87.7 2020-06-11 6:00 45 4
由 reprex package (v0.3.0)
于 2020-06-21 创建创建带有纬度和经度的地图
这是一种使用 leaflet
包映射问题最后部分的方法:
library(leaflet) %>%
leaflet(foo) %>%
addTiles() %>%
addCircleMarkers(lat = ~Latitude,
lng = ~Longitude,
popup = paste("<b>Date:</b>", foo$Date, "<br>",
"<b>Avergae Speed:</b>", foo$Avg_Speed, "<br>",
"<b>ID:</b>", foo$ID, "<br>"))
由 reprex package (v0.3.0)
于 2020-06-21 创建输出
我将交互式传单发布到我的 RPub。 Here is a link
一种使用dplyr
和stringr
的方式:
df %>%
mutate(St_Lat_Long = str_c(Start_Latitude, ", ", Start_Longitude),
End_Lat_Long = str_c(End_Latitude, ", ", End_Longitude)) %>%
select(St_Lat_Long, End_Lat_Long, Date, Avg_Speed) %>%
distinct(across(ends_with("Lat_Long")), .keep_all=TRUE) %>%
mutate(ID = row_number())
产量
# A tibble: 4 x 5
St_Lat_Long End_Lat_Long Date Avg_Speed ID
<chr> <chr> <chr> <dbl> <int>
1 41.92446, -87.68654 41.93184, -87.67459 2020-06-11 6:00 40 1
2 41.90367, -87.63233 41.916, -87.61911 2020-06-11 6:00 35 2
3 41.86468, -87.76746 41.82341, -87.69162 2020-06-11 6:00 54 3
4 41.96075, -87.74756 41.76543, -87.67459 2020-06-11 6:00 45 4
您可以使用 unite
、paste
合并列,并使用 match
和 unique
创建唯一的 ID
。
library(dplyr)
library(tidyr)
df %>%
unite(St_Lat_Long, Start_Latitude, Start_Longitude, sep = ',') %>%
unite(End_Lat_Long, End_Latitude, End_Longitude, sep = ',') %>%
mutate(temp = paste(St_Lat_Long,End_Lat_Long),
ID = match(temp, unique(temp))) %>%
select(-temp)
# A tibble: 4 x 5
# St_Lat_Long End_Lat_Long Date Avg_Speed ID
# <chr> <chr> <chr> <dbl> <int>
#1 41.92446,-87.68654 41.93184,-87.67459 2020-06-11 6:00 40 1
#2 41.90367,-87.63233 41.916,-87.61911 2020-06-11 6:00 35 2
#3 41.86468,-87.76746 41.82341,-87.69162 2020-06-11 6:00 54 3
#4 41.96075,-87.74756 41.76543,-87.67459 2020-06-11 6:00 45 4
在新的 dplyr 1.0.0
中,您可以使用 cur_group_id
为每个组分配一个唯一的编号。
df %>%
unite(St_Lat_Long, Start_Latitude, Start_Longitude, sep = ',') %>%
unite(End_Lat_Long, End_Latitude, End_Longitude, sep = ',') %>%
group_by(St_Lat_Long, End_Lat_Long) %>%
mutate(ID = cur_group_id())