将 Origina Destinations 的 GIS 数据重新排列为每一点一行
Rearrange GIS data of Origina Destinations to one row each point
我从始发目的地 (OD) 的 GIS 数据的 "wide" 数据集开始,我想将其重新排列为每个点一行的更长数据集。我已经设法融化然后 dcast 以重新排列 OD 的第一层,但我正在努力进行下一步
我的 MWE 以三段式从布鲁塞尔到伦敦的旅行为例。但是还有更多的 ID,它们可以有更多或更少的腿。
library(data.table)
Start = c("Brussels","Lille","Dover")
Start_lon <- c(4.3570964,3.075685,1.3047866)
Start_lat <- c(50.845504, 50.6390876,51.12623)
Border = c("Baisieux", "Frethun","London")
Border_lon = c(3.075685,1.811221, -0.1244124)
Border_lat <- c(50.61848, 50.90148, 51.53165)
df <- data.table(ID = 1,
Sub_ID = 1:3,
Start = Start,
Start_lon = Start_lon,
Start_lat,
Border = Border,
Border_lon = Border_lon,
Border_lat = Border_lat)
所以我有一个 ID,三个 Sub_IDs 每个有两个点,最后我希望每个站的每个 ID 有 6 行。我设法将数据从 3 行扩展到 6 行,这样一行是 start/origin 点,另一行是 Type 变量指示的 border/destination 点。
df_long <- melt(df, id.vars = c("ID", "Sub_ID", "Start", "Border"))
df_long <- df_long[, c("Type", "Coordinates") := tstrsplit(variable, "_", fixed=TRUE)]
df_long <- dcast(df_long, ID+Sub_ID+Start+Border+Type~Coordinates, value.var="value")
现在我不知道如何进入 ID==1 和六个 Sub_Ids 的结构,这样我就可以得到 6 行,站位为
Brussels- Baisieux - Lill- Frethun - Dover- London
我希望得到这样的东西
df_goal <- data.table(ID = 1,
Sub_ID = 1:6,
Stop = c("Brussels","Baisieux","Lille", "Frethun", "Dover", "London"),
lat = NA,
lon = NA)
如果止损点是 "Start" 或 "Border"
,可能仍然有信息
让我离开我用 tidyverse 方法尝试过的方法。我按 ID
拆分数据。 (你只有一个ID,但你可能有多个ID。所以我决定这样做。)对于每个列表组件,我选择了Start
和Border_lat
之间的列,转置,取消列出,然后创建了一个矩阵。我用三列(城市、经度和纬度)填充了这个矩阵,并将矩阵转换为数据框。对于每个 ID 组,我添加了一个名为 Type
的新列。我在这里重复了 Start
和 Border
。最后,我更改了列名并将 lon
和 lat
转换为数字。我相信有简洁的方法来处理这个问题。
library(dplyr)
library(purrr)
map_dfr(.x = split(df, f = df$ID),
.f = function(x){dplyr::select(x, Start:Border_lat) %>%
t %>%
unlist %>%
matrix(ncol = 3, byrow = TRUE) %>%
as.data.frame(stringsAsFactors = FALSE)},
.id = "ID") %>%
group_by(ID) %>%
mutate(Type = rep(c("Start", "Border"), times = n()/2)) %>%
rename(stop = "V1", lon = "V2", lat = "V3") %>%
mutate_at(vars(lon:lat),
.funs = list(~as.numeric(.)))
# ID stop lon lat Type
# <chr> <chr> <dbl> <dbl> <chr>
# 1 1 Brussels 4.36 50.8 Start
# 2 1 Baisieux 3.08 50.6 Border
# 3 1 Lille 3.08 50.6 Start
# 4 1 Frethun 1.81 50.9 Border
# 5 1 Dover 1.30 51.1 Start
# 6 1 London -0.124 51.5 Border
# 7 2 Brussels 4.36 50.8 Start
# 8 2 Baisieux 3.08 50.6 Border
# 9 2 Lille 3.08 50.6 Start
#10 2 Frethun 1.81 50.9 Border
#11 2 Dover 1.30 51.1 Start
#12 2 London -0.124 51.5 Border
另一种选择
这是 data.table 的想法。根据您所说的,列数为 8,行数因每个 ID 而异。鉴于此,我想出了以下方法。
df[, .(Stop = c(Start, Border),
Type = c("Start", "Border"),
lon = c(Start_lon, Border_lon),
lat = c(Start_lat, Border_lat)),
by = .(ID, Sub_ID)]
# ID Sub_ID Stop Type lon lat
# 1: 1 1 Brussels Start 4.3570964 50.84550
# 2: 1 1 Baisieux Border 3.0756850 50.61848
# 3: 1 2 Lille Start 3.0756850 50.63909
# 4: 1 2 Frethun Border 1.8112210 50.90148
# 5: 1 3 Dover Start 1.3047866 51.12623
# 6: 1 3 London Border -0.1244124 51.53165
# 7: 2 1 Brussels Start 4.3570964 50.84550
# 8: 2 1 Baisieux Border 3.0756850 50.61848
# 9: 2 2 Lille Start 3.0756850 50.63909
#10: 2 2 Frethun Border 1.8112210 50.90148
#11: 2 3 Dover Start 1.3047866 51.12623
#12: 2 3 London Border -0.1244124 51.53165
数据
df <- structure(list(ID = c(1, 1, 1, 2, 2, 2), Sub_ID = c(1L, 2L, 3L,
1L, 2L, 3L), Start = c("Brussels", "Lille", "Dover", "Brussels",
"Lille", "Dover"), Start_lon = c(4.3570964, 3.075685, 1.3047866,
4.3570964, 3.075685, 1.3047866), Start_lat = c(50.845504, 50.6390876,
51.12623, 50.845504, 50.6390876, 51.12623), Border = c("Baisieux",
"Frethun", "London", "Baisieux", "Frethun", "London"), Border_lon = c(3.075685,
1.811221, -0.1244124, 3.075685, 1.811221, -0.1244124), Border_lat = c(50.61848,
50.90148, 51.53165, 50.61848, 50.90148, 51.53165)), row.names = c(NA,
-6L), class = c("data.table", "data.frame"))
我从始发目的地 (OD) 的 GIS 数据的 "wide" 数据集开始,我想将其重新排列为每个点一行的更长数据集。我已经设法融化然后 dcast 以重新排列 OD 的第一层,但我正在努力进行下一步
我的 MWE 以三段式从布鲁塞尔到伦敦的旅行为例。但是还有更多的 ID,它们可以有更多或更少的腿。
library(data.table)
Start = c("Brussels","Lille","Dover")
Start_lon <- c(4.3570964,3.075685,1.3047866)
Start_lat <- c(50.845504, 50.6390876,51.12623)
Border = c("Baisieux", "Frethun","London")
Border_lon = c(3.075685,1.811221, -0.1244124)
Border_lat <- c(50.61848, 50.90148, 51.53165)
df <- data.table(ID = 1,
Sub_ID = 1:3,
Start = Start,
Start_lon = Start_lon,
Start_lat,
Border = Border,
Border_lon = Border_lon,
Border_lat = Border_lat)
所以我有一个 ID,三个 Sub_IDs 每个有两个点,最后我希望每个站的每个 ID 有 6 行。我设法将数据从 3 行扩展到 6 行,这样一行是 start/origin 点,另一行是 Type 变量指示的 border/destination 点。
df_long <- melt(df, id.vars = c("ID", "Sub_ID", "Start", "Border"))
df_long <- df_long[, c("Type", "Coordinates") := tstrsplit(variable, "_", fixed=TRUE)]
df_long <- dcast(df_long, ID+Sub_ID+Start+Border+Type~Coordinates, value.var="value")
现在我不知道如何进入 ID==1 和六个 Sub_Ids 的结构,这样我就可以得到 6 行,站位为
Brussels- Baisieux - Lill- Frethun - Dover- London
我希望得到这样的东西
df_goal <- data.table(ID = 1,
Sub_ID = 1:6,
Stop = c("Brussels","Baisieux","Lille", "Frethun", "Dover", "London"),
lat = NA,
lon = NA)
如果止损点是 "Start" 或 "Border"
,可能仍然有信息让我离开我用 tidyverse 方法尝试过的方法。我按 ID
拆分数据。 (你只有一个ID,但你可能有多个ID。所以我决定这样做。)对于每个列表组件,我选择了Start
和Border_lat
之间的列,转置,取消列出,然后创建了一个矩阵。我用三列(城市、经度和纬度)填充了这个矩阵,并将矩阵转换为数据框。对于每个 ID 组,我添加了一个名为 Type
的新列。我在这里重复了 Start
和 Border
。最后,我更改了列名并将 lon
和 lat
转换为数字。我相信有简洁的方法来处理这个问题。
library(dplyr)
library(purrr)
map_dfr(.x = split(df, f = df$ID),
.f = function(x){dplyr::select(x, Start:Border_lat) %>%
t %>%
unlist %>%
matrix(ncol = 3, byrow = TRUE) %>%
as.data.frame(stringsAsFactors = FALSE)},
.id = "ID") %>%
group_by(ID) %>%
mutate(Type = rep(c("Start", "Border"), times = n()/2)) %>%
rename(stop = "V1", lon = "V2", lat = "V3") %>%
mutate_at(vars(lon:lat),
.funs = list(~as.numeric(.)))
# ID stop lon lat Type
# <chr> <chr> <dbl> <dbl> <chr>
# 1 1 Brussels 4.36 50.8 Start
# 2 1 Baisieux 3.08 50.6 Border
# 3 1 Lille 3.08 50.6 Start
# 4 1 Frethun 1.81 50.9 Border
# 5 1 Dover 1.30 51.1 Start
# 6 1 London -0.124 51.5 Border
# 7 2 Brussels 4.36 50.8 Start
# 8 2 Baisieux 3.08 50.6 Border
# 9 2 Lille 3.08 50.6 Start
#10 2 Frethun 1.81 50.9 Border
#11 2 Dover 1.30 51.1 Start
#12 2 London -0.124 51.5 Border
另一种选择
这是 data.table 的想法。根据您所说的,列数为 8,行数因每个 ID 而异。鉴于此,我想出了以下方法。
df[, .(Stop = c(Start, Border),
Type = c("Start", "Border"),
lon = c(Start_lon, Border_lon),
lat = c(Start_lat, Border_lat)),
by = .(ID, Sub_ID)]
# ID Sub_ID Stop Type lon lat
# 1: 1 1 Brussels Start 4.3570964 50.84550
# 2: 1 1 Baisieux Border 3.0756850 50.61848
# 3: 1 2 Lille Start 3.0756850 50.63909
# 4: 1 2 Frethun Border 1.8112210 50.90148
# 5: 1 3 Dover Start 1.3047866 51.12623
# 6: 1 3 London Border -0.1244124 51.53165
# 7: 2 1 Brussels Start 4.3570964 50.84550
# 8: 2 1 Baisieux Border 3.0756850 50.61848
# 9: 2 2 Lille Start 3.0756850 50.63909
#10: 2 2 Frethun Border 1.8112210 50.90148
#11: 2 3 Dover Start 1.3047866 51.12623
#12: 2 3 London Border -0.1244124 51.53165
数据
df <- structure(list(ID = c(1, 1, 1, 2, 2, 2), Sub_ID = c(1L, 2L, 3L,
1L, 2L, 3L), Start = c("Brussels", "Lille", "Dover", "Brussels",
"Lille", "Dover"), Start_lon = c(4.3570964, 3.075685, 1.3047866,
4.3570964, 3.075685, 1.3047866), Start_lat = c(50.845504, 50.6390876,
51.12623, 50.845504, 50.6390876, 51.12623), Border = c("Baisieux",
"Frethun", "London", "Baisieux", "Frethun", "London"), Border_lon = c(3.075685,
1.811221, -0.1244124, 3.075685, 1.811221, -0.1244124), Border_lat = c(50.61848,
50.90148, 51.53165, 50.61848, 50.90148, 51.53165)), row.names = c(NA,
-6L), class = c("data.table", "data.frame"))