截取行程次数的路线

Interception of routes with number of trips

我有一个 sf 数据框,其中包含从一个自行车站到另一个自行车站的行程次数。几何列包含两个站点之间的直接路线(由 osm 给出)。

我想绘制一张地图,其中街道的颜色是根据所进行的旅行次数的梯度来着色的。

我的问题是我有按路线而不是按街道的旅行次数。

我用st_interception()提取两条路线的相同部分,用st_difference()提取差异。

对于 10 趟和 15 趟的两条线路,这就是我想要的。

library('sf')
library('ggplot2') # dev version

route1 <- st_linestring(rbind(c(0, 0), c(1, 1), c(2, 2), c(3, 3)))
route2 <- st_linestring(rbind(c(1, 0), c(1, 1), c(2, 2), c(3, 0)))

route1 <- st_sf(id = 1, trips = 10, geometry = st_sfc(route1))
route2 <- st_sf(id = 2, trips = 15, geometry = st_sfc(route2))

# not ok as the segment (1,1 to 2,2) that is supposed to have 25 trips   only has 15 (the number of trips for the second line plotted)
ggplot(data = rbind(route1, route2)) + geom_sf(mapping = aes(color = trips)) + 
  theme(panel.grid.major = element_line(colour = 'transparent'))

# mergeRoutes gives the desired output
route <- mergeRoutes(route1, route2, init = TRUE)
ggplot(data = route) + geom_sf(mapping = aes(color = trips)) + 
  theme(panel.grid.major = element_line(colour = 'transparent'))

我写了 mergeRoute 函数,它给出了我想要的两条路线,但它不能很好地扩展到很多很多路线。

#'
#' This function merges two routes. It returns the interscetion (if any) with the number
#' of associated count and also the remaining pars of the routes or the second route or
#' (if init) the two routes.
#'
#' @param route1 a row with id , count and geometry
#' @param route2 a row with id , count and geometry
#' @param init logical, whether to return the two routes even if there is no intersection
#'   or only the second one
#'
#' @return a data frame with 3 rows if there is an intersection, nothing otherwise.
#'   
mergeRoutes <- function(route1, route2, init = FALSE) 
{
  intersection <- st_intersection(route1$geometry, route2$geometry)
  # if the intersection is only points or is empty then the result is the two routes
  # untouched to avoid adding too many elements to the result
  if(length(intersection) != 0 & 
     !'sfc_POINT' %in% class(intersection) &
     !'sfc_MULTIPOINT' %in% class(intersection)) {
    # if intersection is a geometry with point and lines extract the lines only
    intersection <- st_collection_extract(x = intersection, type = "LINESTRING")
    count <- route1$count + route2$count
    intersection <- data.frame(id = route1$id, count = count, geometry = intersection)
    # keep the part of the initial routes that are not in the intersection
    route1_dif <- st_difference(route1$geometry, route2$geometry)
    route2_dif <- st_difference(route2$geometry, route1$geometry)
    # if one route is completely covered by the the other then it is not added to the result
    if(length(route1_dif) != 0) {
      route1 <- data.frame(id = route1$id, 
                           count = route1$count, 
                           geometry = route1_dif)
    } else {
      route1 <- NULL
    }
    if(length(route2_dif) != 0) {
      route2 <- data.frame(id = route2$id, 
                           count = route2$count, 
                           geometry = route2_dif)
    } else {
      route2 <- NULL
    }
    result <- rbind(intersection, route1, route2)
    return(result)
  } else if (init) {
    result <- rbind(route1, route2)
  } else {
    result <- route2
  }
  return(result)
}

所以我有一些东西可以在两条线路上工作,但是如果我试图遍历所有站点之间的所有路线,它会无限期地进行。我想不出比 for 循环中的 lapply() 更好的方法了,而且这不会在我的 mac(16gb ram,2.5 ghz)上终止,它甚至在 15h 运行。

这里是我在近2000条路线上的尝试(数据可以找到here)。

# To merge all the routes, each new route is compared to all the rows from the previous
# comparison. New rows are added to the resulting data frame at each step. If there is no
# intersection then the route being compared to the others is added untouched.

# initiate comparison
segment_routes <- mergeRoutes(route1 = directions %>% slice(1), 
                              route2 = directions %>% slice(2), 
                              init = TRUE)

# compute directions segmentation for all the routes
for(i in 3:nrow(directions)) {
  new_route <- directions %>% slice(i)
  # compare the new route to a the segments resulting fro mprevious comparison
  new_routes <- lapply(X = seq(nrow(segment_routes)), 
                       FUN = function(j) mergeRoutes(route1 = segment_routes %>% slice(j), 
                                                     route2 = new_route))
  new_routes <- do.call(rbind, new_routes)
  # make an sf object
  new_routes <- st_sf(new_routes, 
                      geometry = new_routes$geometry, 
                      crs = st_crs(directions))
  # add the new segemnts to the ones from the previous iteration
  segment_routes <- rbind(segment_routes, new_routes)
}

我知道您可以将数据框直接传递给 st_intersection(),但我不知道如何指定我要添加计数,而且超过 2 条路线可以共享一条街道的同一部分所以单次调用拦截不会提供正确的输出。

我在这里使用 sf 和数据框,但任何使用 sp and/or data.table 或其他包的解决方案对我来说都是完美的。

如有任何帮助,我们将不胜感激。

编辑:这是我的会话信息

R version 3.4.3 (2017-11-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.3

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] ggplot2_2.2.1.9000 sf_0.6-0          

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.15      class_7.3-14      withr_2.1.1.9000  plyr_1.8.4       
 [5] grid_3.4.3        gtable_0.2.0      DBI_0.7           magrittr_1.5     
 [9] e1071_1.6-8       units_0.5-1       scales_0.5.0.9000 pillar_1.2.1     
[13] rlang_0.2.0       lazyeval_0.2.1    tools_3.4.3       udunits2_0.13    
[17] munsell_0.4.3     yaml_2.1.17       compiler_3.4.3    colorspace_1.3-2 
[21] classInt_0.1-24   tibble_1.4.2     

方法论

假设你的所有路线都是 LINESTRINGS,并且 LINESTRING 只是一系列坐标,我们可以将每个顺序坐标对视为 'from' 并且一个'to'。

如果我们使用 data.table 来存储坐标(而不是 sf),解决方案将变成一个简单的分组和计数操作,并且应该可以很好地扩展到更大的数据集。

例子

这是您在 link

中提供的数据的示例

步骤 1 - 转换为 data.table

library(sf)
library(data.table)
library(googleway) ## for plotting. can also use ggplot2, ggmap, leaflet, mapview...

sf <- readRDS("~/Downloads/directions.rds")
sf$row_id <- 1:nrow(sf) ## for joining 

dt_routes <- as.data.table(st_coordinates(sf))

## put on the rest of the trip data
## this assumes the 'L1' value from `st_coordinates` matches the 
## `id` value from the sf_routes object
## (if not, you will need a sequential 1:nrow 'id' value to match the 
## 'L1' value)
dt_sf <- sf
st_geometry(dt_sf) <- NULL

dt_routes <- dt_routes[
  dt_sf
    , on = c(L1 = "row_id")
    , nomatch = 0
  ]

步骤 2 - 创建 'from' 和 'to'

我们可以移动 X 和 Y 列,得到 'from' 和 'to' 列

dt_routes[
    , `:=`(X_to = shift(X, type = "lead"),
           Y_to = shift(Y, type = "lead"))
    , by = L1
  ]

步骤 3 - 分组和计数

现在我们可以计算每个坐标对的行程数

dt_trips <- dt_routes[
    !is.na(X_to)
    , .(n_trips = sum(count))
    , by = .(X, Y, X_to, Y_to)
  ]

步骤 4 - 转换回 sf

经过一些重新排列后,我们现在可以将每个 from/to 对转换为 LINESTRINGS,每个都有自己的权重(即 num_trips

dt_trips[, line_id := .I]

dt_from <- dt_trips[, .(X, Y, n_trips, line_id)]
dt_to <- dt_trips[, .(X = X_to, Y = Y_to, n_trips, line_id)]

dt_from[, line_sequence := 1]
dt_to[, line_sequence := 2]

dt_trips <- rbindlist(list(
  dt_from, dt_to
))

setorder(dt_trips, line_id, line_sequence)

## convert back to `sf` object
dt_trips <- dt_trips[, {
  geometry <- sf::st_linestring(x = matrix(c(X, Y), ncol = 2))
  geometry <- sf::st_sfc(geometry)
  geometry <- sf::st_sf(geometry)
}, by = .(line_id, n_trips)]

sf_trips <- sf::st_as_sf(dt_trips)

步骤 5 - 绘图

## applying a log-transform so the contrast shows up
sf_trips$n_trips <- log(sf_trips$n_trips)

library(googleway)
set_key("GOOGLE_MAP_KEY")

google_map(data = sf_trips) %>%
  add_polylines(
    stroke_colour = "n_trips"
    , stroke_opacity =1
    , stroke_weight = 3.5
    #, legend = T
    , info_window = "n_trips"
    , palette = viridisLite::viridis
  )