使用 ggmap 包标记 R 中的地理编码错误

Flag geocoding errors in R using the ggmap package

我有一个包含两列 on_roadat_road 的数据集,它们的组合构成了一个名为 geocode_string 的字符串。使用此字符串,我希望使用 google API 键对这些交叉点进行地理编码。例如,我有 on_road = Silverdaleat_road = W 28th St,它们组合形成 geocode_string = Silverdale and W 28th St, Cleveland, OH.

但是,当我尝试使用 ggmap 中的 geocode 函数时,我收到此消息:"SILVERDALE and W ..." not uniquely geocoded, using "silverdale ave, cleveland, oh 44109, usa".

在这种情况下,R 似乎只是默认假定一个位置,在这种情况下只是 silverdale ave。我想让 R 不这样做——也许只是为了将无法找到唯一地理编码的位置留空。然后我可以通过并手动找到此类情况的坐标。我只是想以某种方式标记观察结果。

我还想指出,在数据集的第二行中,我得到 S MARGINAL RD and W 93RD ST , CLEVELAND , OH,这是克利夫兰不存在的交叉点。当我将该字符串粘贴到 google 地图时,它似乎在搜索部分匹配项并为我提供了 S Marginal Rd 的坐标。有没有想过为什么不存在的交点会在这种情况下生成坐标,而不是上面描述的 Silverdale 情况?有什么办法可以防止这种情况发生吗?

如有任何帮助,我将不胜感激!

geocode(df$geocode_string)

structure(list(on_road = c("EDDY RD", "S MARGINAL RD", "MLK", 
"MLK", "IMPERIAL AVE", "HARVARD", "E 55TH", "W 41ST", "SILVERDALE", 
"ONTARIO", "MLK", "CEDAR", "DENNISON AVE", "QUIGLEY RD", "AEROSPACE PKWY", 
"CEDAR", "MLK DR", "LEE RD", "E 93RD", "W QUIGOLY", "W 14TH", 
"W 25TH", "W MALL DR", "E 185TH", "FARRINGTON", "APPLE AVE", 
"FAIRHILL RD", "ST CLAIR", "E 93RD", "FAIRHILL", "E 123RD", "DETROIT RD", 
"CEDAR HILL", "MARTIN LUTHER KING BLVD", "E 109TH", "W 105TH", 
"W WOODLAND AVE", "LAKEWOOD HTS BLVD", "E 56TH", "MARTIN LUTHER KING BLVD", 
"OVINGTON", "MADISON AVE", "QUIGLEY", "DILLE RD", "QUINCY", "MLK", 
"CORONADO AVE", "DETROIT", "MT SINAI DR", "LAKESIDE AVE"), at_road = c("PAXTON RD", 
"W 93RD ST", "ANSEL", "PARKVIEW", "LUKE AVE", "E 163RD", "SCOVAL", 
"BAILY", "W 28TH ST", "E 6TH", "SUPERIOR", "AMBLESIDE", "W 53RD ST", 
"STEELYARD DR", "E SHAFFORD RD", "AMBLESIDE", "ANSEL RD", "S JUDSON AVE", 
"SOPHIA", "STEEL DR", "QUIGOLY", "DENISON", "PUBLIC SQ", "ST CLAIR", 
"E 127TH", "W 41ST PL", "CEDAR RD", "E 178TH", "LAMONTIER", "AMBLESIDE", 
"GRIFFIN AVE", "W 102ND", "MURRAY HILL", "MT AUBURN", "ST CLAIR", 
"S FRONTAGE", "KEMPER", "ALGERS", "BROADWAY AVE", "CORLETT AVE", 
"UNION", "W 86TH ST", "STEELYARD DR", "ST CLAIR AVE", "E 38TH", 
"BENHAM", "E 126TH", "W 47TH", "MLK JR BLVD", "FRANZ PASTORINA"
), geocode_string = c("EDDY RD and PAXTON RD , CLEVELAND , OH", 
"S MARGINAL RD and W 93RD ST , CLEVELAND , OH", "MLK and ANSEL , CLEVELAND , OH", 
"MLK and PARKVIEW , CLEVELAND , OH", "IMPERIAL AVE and LUKE AVE , CLEVELAND , OH", 
"HARVARD and E 163RD , CLEVELAND , OH", "E 55TH and SCOVAL , CLEVELAND , OH", 
"W 41ST and BAILY , CLEVELAND , OH", "SILVERDALE and W 28TH ST , CLEVELAND , OH", 
"ONTARIO and E 6TH , CLEVELAND , OH", "MLK and SUPERIOR , CLEVELAND , OH", 
"CEDAR and AMBLESIDE , CLEVELAND , OH", "DENNISON AVE and W 53RD ST , CLEVELAND , OH", 
"QUIGLEY RD and STEELYARD DR , CLEVELAND , OH", "AEROSPACE PKWY and E SHAFFORD RD , CLEVELAND , OH", 
"CEDAR and AMBLESIDE , CLEVELAND , OH", "MLK DR and ANSEL RD , CLEVELAND , OH", 
"LEE RD and S JUDSON AVE , CLEVELAND , OH", "E 93RD and SOPHIA , CLEVELAND , OH", 
"W QUIGOLY and STEEL DR , CLEVELAND , OH", "W 14TH and QUIGOLY , CLEVELAND , OH", 
"W 25TH and DENISON , CLEVELAND , OH", "W MALL DR and PUBLIC SQ , CLEVELAND , OH", 
"E 185TH and ST CLAIR , CLEVELAND , OH", "FARRINGTON and E 127TH , CLEVELAND , OH", 
"APPLE AVE and W 41ST PL , CLEVELAND , OH", "FAIRHILL RD and CEDAR RD , CLEVELAND , OH", 
"ST CLAIR and E 178TH , CLEVELAND , OH", "E 93RD and LAMONTIER , CLEVELAND , OH", 
"FAIRHILL and AMBLESIDE , CLEVELAND , OH", "E 123RD and GRIFFIN AVE , CLEVELAND , OH", 
"DETROIT RD and W 102ND , CLEVELAND , OH", "CEDAR HILL and MURRAY HILL , CLEVELAND , OH", 
"MARTIN LUTHER KING BLVD and MT AUBURN , CLEVELAND , OH", "E 109TH and ST CLAIR , CLEVELAND , OH", 
"W 105TH and S FRONTAGE , CLEVELAND , OH", "W WOODLAND AVE and KEMPER , CLEVELAND , OH", 
"LAKEWOOD HTS BLVD and ALGERS , CLEVELAND , OH", "E 56TH and BROADWAY AVE , CLEVELAND , OH", 
"MARTIN LUTHER KING BLVD and CORLETT AVE , CLEVELAND , OH", "OVINGTON and UNION , CLEVELAND , OH", 
"MADISON AVE and W 86TH ST , CLEVELAND , OH", "QUIGLEY and STEELYARD DR , CLEVELAND , OH", 
"DILLE RD and ST CLAIR AVE , CLEVELAND , OH", "QUINCY and E 38TH , CLEVELAND , OH", 
"MLK and BENHAM , CLEVELAND , OH", "CORONADO AVE and E 126TH , CLEVELAND , OH", 
"DETROIT and W 47TH , CLEVELAND , OH", "MT SINAI DR and MLK JR BLVD , CLEVELAND , OH", 
"LAKESIDE AVE and FRANZ PASTORINA , CLEVELAND , OH")), row.names = c(NA, 
-50L), class = c("tbl_df", "tbl", "data.frame"))

我遇到了类似的问题。我能想到的最佳解决方案是更改“地理编码”功能,您可以在 github here

找到

我包括了两个额外的列:列'status':通知每个地址的匹配数。因此,您可以很容易地发现“不是唯一地理编码,使用”发生的地方。我还包含了 address2 列以告知找到的第二个地址是什么(在状态 > 1 的情况下)。

我通过包含以下标记为 'new'

的部分来做到这一点
  ## format geocoded data
  
  gcdf <- with(gc$results[[2]], {
    tibble(
      "lon" = NULLtoNA(geometry$location$lng),
      "lat" = NULLtoNA(geometry$location$lat),
      "type" = tolower(NULLtoNA(types[2])),
      "loctype" = tolower(NULLtoNA(geometry$location_type)),
      "address" = location, # dsk doesn't give the address
      "north" = NULLtoNA(geometry$viewport$northeast$lat),
      "south" = NULLtoNA(geometry$viewport$southwest$lat),
      "east" = NULLtoNA(geometry$viewport$northeast$lng),
      "west" = NULLtoNA(geometry$viewport$southwest$lng),
      'status' = NULLtoNA(length(gc$results)) # new!
    )
  })
  
  if (length(gc$results) > 1L) { # new!
   
   gcdf$address2 <- tolower(NULLtoNA(gc$results[[2]]$formatted_address))
    
  } else {
    
  gcdf$address2 <- "NA"
    
    

  }


  # add address
  if (source == "google") gcdf$address <- tolower(NULLtoNA(gc$results[[2]]$formatted_address))
  if (output == "latlon") return(gcdf[,c("lon","lat", "status", "address", "address2")]) # new!

最后,我只是 运行 R 中的新函数并添加了以下代码来修改包版本(有关详细信息,请参阅 this question)。

environment(geocode) <- asNamespace('ggmap')
assignInNamespace("geocode", geocode, ns = "ggmap")