R 代码迭代 google 地图距离查询的数据帧行

R code to iterate through dataframe rows for google maps distance queries

我正在寻找一些帮助来编写一些 R 代码以遍历数据框中的行并将每行中的值传递给函数并将输出打印到 excel 文件、txt 文件或者只是在控制台中。

这样做的目的是使用在这个网站上找到的函数将一堆 distance/time 查询(几百个)自动化到 google 地图:http://www.nfactorialanalytics.com/r-vignette-for-the-week-finding-time-distance-between-two-places/

该网站的功能如下:

library(XML)
library(RCurl)
distance2Points <- function(origin,destination){
 results <- list();
 xml.url <- paste0('http://maps.googleapis.com/maps/api/distancematrix/xml?origins=',origin,'&destinations=',destination,'&mode=driving&sensor=false')
 xmlfile <- xmlParse(getURL(xml.url))
 dist <- xmlValue(xmlChildren(xpathApply(xmlfile,"//distance")[[1]])$value)
 time <- xmlValue(xmlChildren(xpathApply(xmlfile,"//duration")[[1]])$value)
 distance <- as.numeric(sub(" km","",dist))
 time <- as.numeric(time)/60
 distance <- distance/1000
 results[['time']] <- time
 results[['dist']] <- distance
 return(results)
}

数据框将包含两列:始发地邮政编码和目的地邮政编码(加拿大,嗯?)。我是初学者 R 程序员,所以我知道如何使用 read.table 将 txt 文件加载到数据帧中。我只是不确定如何遍历数据帧,每次将值传递给 distance2Points 函数并执行。我认为这可以使用 for 循环或应用调用之一来完成?

感谢您的帮助!

编辑:

为简单起见,假设我想将这两个向量转换为数据帧

> a <- c("L5B4P2","L5B4P2")
> b <- c("M5E1E5", "A2N1T3")
> postcodetest <- data.frame(a,b)
> postcodetest
       a      b
1 L5B4P2 M5E1E5
2 L5B4P2 A2N1T3

我应该如何从 distance2Points 函数迭代这两行到 return 距离和时间?

这是一种方法,使用 lapply 生成一个列表,其中包含数据中每一行的结果,并使用 Reduce(rbind, [yourlist]) 将该列表连接到一个数据框中,其行对应于你原来的那些。为了完成这项工作,我们还必须将原始函数中的代码调整为 return 单行数据框,所以我在这里完成了。

distance2Points <- function(origin,destination){

  require(XML)
  require(RCurl)

  xml.url <- paste0('http://maps.googleapis.com/maps/api/distancematrix/xml?origins=',origin,'&destinations=',destination,'&mode=driving&sensor=false')
  xmlfile <- xmlParse(getURL(xml.url))
  dist <- xmlValue(xmlChildren(xpathApply(xmlfile,"//distance")[[1]])$value)
  time <- xmlValue(xmlChildren(xpathApply(xmlfile,"//duration")[[1]])$value)
  distance <- as.numeric(sub(" km","",dist))
  time <- as.numeric(time)/60
  distance <- distance/1000
  # this gives you a one-row data frame instead of a list, b/c it's easy to rbind
  results <- data.frame(time = time, distance = distance)
  return(results)
}

# now apply that function rowwise to your data, using lapply, and roll the results
# into a single data frame using Reduce(rbind)
results <- Reduce(rbind, lapply(seq(nrow(postcodetest)), function(i)
  distance2Points(postcodetest$a[i], postcodetest$b[i])))

应用于示例数据的结果:

> results
        time distance
1   27.06667   27.062
2 1797.80000 2369.311

如果您希望在不创建新对象的情况下执行此操作,您还可以编写单独的函数来计算时间和距离——或者将这些输出作为选项的单个函数——然后使用 sapply或者只是 mutate 在原始数据框中创建新列。这是使用 sapply:

时的样子
distance2Points <- function(origin, destination, output){

  require(XML)
  require(RCurl)

  xml.url <- paste0('http://maps.googleapis.com/maps/api/distancematrix/xml?origins=',
                    origin, '&destinations=', destination, '&mode=driving&sensor=false')

  xmlfile <- xmlParse(getURL(xml.url))

  if(output == "distance") {

    y <- xmlValue(xmlChildren(xpathApply(xmlfile,"//distance")[[1]])$value)
    y <- as.numeric(sub(" km", "", y))/1000

  } else if(output == "time") {

    y <- xmlValue(xmlChildren(xpathApply(xmlfile,"//duration")[[1]])$value)
    y <- as.numeric(y)/60

  } else {

    y <- NA    

  }

  return(y)

}

postcodetest$distance <- sapply(seq(nrow(postcodetest)), function(i)
  distance2Points(postcodetest$a[i], postcodetest$b[i], "distance"))

postcodetest$time <- sapply(seq(nrow(postcodetest)), function(i)
  distance2Points(postcodetest$a[i], postcodetest$b[i], "time"))

下面是如何在 dplyr 管道中使用 mutate:

library(dplyr)

postcodetest <- postcodetest %>%
  mutate(distance = sapply(seq(nrow(postcodetest)), function(i)
           distance2Points(a[i], b[i], "distance")),
         time = sapply(seq(nrow(postcodetest)), function(i)
           distance2Points(a[i], b[i], "time")))