通过 HERE API REST 查询查询多个值的最有效方法是什么?

What is the most efficient way to query multiple values via HERE API REST query?

问题:

如何有效地处理对 HERE API 的多个位置请求?

我一般不熟悉 GET 请求和 REST,但我需要获取位置数据,我正在试验 HERE API。我在 R 中这样做,但这个事实与我的问题无关。

这个有效:

library(httr)
library(jsonlite)

HERE_API_KEY <- #REMOVED#
url <- "https://geocode.search.hereapi.com/v1/"
zip <- 18615
country <- "United+States"

theRequest <- paste0(url,"geocode?qq=postalCode=",zip,";country=",country,"&apiKey=",HERE_API_KEY)

theResponse <- GET(theRequest)

我收到状态 200 消息和数据内容 -- 没有问题。

我想要的:

上面的例子只是一个位置,但我有一个包含数千个位置的列表,我需要查找,最终尝试确定位置数据集中两点之间的路由距离。

我可以创建一个循环并像上面演示的那样一次为每个位置提交一个请求,但是由于我有一堆,我想知道是否有一种首选方法可以在一次调用中提交位置列表(或将其分成几组?)这对 HERE API 更好,并且可以有效地获取数据。在黑暗中刺伤,我在 3 个位置尝试了这个测试:

theRequest <- "https://geocode.search.hereapi.com/v1/geocode?qq=postalCode=18615;country=United+States&qq=postalCode=L4T1G3;country=Canada&qq=postalCode=62521;country=United+States&apiKey=#REMOVED#"

但是没有用。也许这是不可能的,我只是不了解 REST,但我想尽可能高效地处理多个请求——既为我自己,也为 HERE API 服务。提前谢谢你。

如果你想使用 HERE Geocoding and Search API, looping through your data to send individual GET requests for each address is a perfectly valid approach, just make sure that you don't exceed the maximum allowed number of requests per second (RPS) for the plan you have (5 RPS for the Geocoding and Search API in the Freemium Plan,例如);否则您的查询将导致错误代码 429“请求过多”,您将不得不重新发送它们。

或者,您可以使用 HERE Batch Geocoder API,它旨在通过单个 API 调用处理大型数据集(最多 100 万条记录用于地理编码或反向地理编码)。使用此服务包括 3 个步骤:

  1. 发送一个 POST 请求,其中包含您要进行地理编码或反向地理编码的所有数据。
  2. 调用状态端点来监控您提交的作业的状态。您可能希望定期进行此类调用,直到响应表明您的工作已完成并且您的输出已准备好下载。
  3. 下载您的结果。

Here's an example 如何使用此服务;请注意,此 API 支持 POST 请求而不是 GET。

举例回答

astro.comma 的 回答指出了我需要去哪里才能获得批量的 HERE API——严格来说这就是答案以及为什么它被标记为这样。对于那些后来经过这里的人,这是我用来弄清楚如何在 R 中实现请求的测试脚本,基于我从 astro.comma.

获得的帮助

示例数据:

控制台:

df_locations[1:5,]  # Show a sample of the data in the data frame

># A tibble: 5 x 3
>  recID country postalCode
>  <int> <fct>   <chr>     
>1     1 CAN     L4T1G3    
>2     2 USA     62521     
>3     3 CAN     H9P1K2    
>4     4 CAN     L6S4K6    
>5     5 USA     52632     

dput(df_locations[1:5,])  # For ease of reproducibility, here's dput():


structure(list(recID = 1:5, country = structure(c(1L, 2L, 1L, 
1L, 2L), .Label = c("CAN", "USA", "MEX"), class = "factor"), 
    postalCode = c("L4T1G3", "62521", "H9P1K2", "L6S4K6", "52632"
    )), row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame"
))

脚本:

library(httr)
library(tidyverse)
    
HERE_API_KEY <- "YOU-CANT-HAVE-THIS-BECAUSE-ITS-MINE"

url <- "https://batch.geocoder.ls.hereapi.com/6.2/jobs"

# Write df_locations to pipe-delimited text file to prep for POST
write.table(
  df_locations,
  file = "locations.txt",
  quote = FALSE,
  sep = "|",
  row.names = FALSE
  )
    
# Assemble the POST request url to start the job
theRequest <-
  paste0(
        url,
        "?&apiKey=",
        HERE_API_KEY,
        "&action=run&header=true",
        "&indelim=%7C&outdelim=%7C",
        "&outcols=recId%2CseqNumber%2CseqLength%2CdisplayLatitude",
        "%2CdisplayLongitude%2Ccity%2CpostalCode%2Ccountry",
        "&outputCombined=true"
  )

# Now submit the POST request along with the location file
theResponse <-
    POST(url = theRequest, body = upload_file("locations.txt"))
    

控制台:

>theResponse
Response [https://batch.geocoder.ls.hereapi.com/6.2/jobs?&apiKey=YOU-CANT-HAVE-THIS-BECAUSE-ITS-MINE&action=run&header=true&indelim=%7C&outdelim=%7C&outcols=recId%2CseqNumber%2CseqLength%2CdisplayLatitude%2CdisplayLongitude%2Ccity%2CpostalCode%2Ccountry&outputCombined=true]
Date: 2021-12-27 00:45
Status: 200
Content-Type: application/json;charset=utf-8
Size: 209 B

脚本:

# Extract the Request ID so we can check for completion status of the job, and 
# use it to identify / download the zip file when complete.

reqID <- content(theResponse)$Response$MetaInfo$RequestId
    

控制台:

>reqID
[1] "XS9wSVt3y0Dch1Q48gX1xohewUKIw595"  # or looks like this -- I changed it here.

脚本:

# After letting some time pass (about a minute for my test file), I check
# status of the job with a GET request:

JOB_status <-
    GET(paste0(url, "/", reqID, "?action=status&apiKey=", HERE_API_KEY))
    

控制台:

>content(JOB_status)
$Response
$Response$MetaInfo
$Response$MetaInfo$RequestId
[1] "XS9wSVt3y0Dch1Q48gX1xohewUKIw595"


$Response$Status
[1] "completed"         #  There are other statuses (statii?), but this one we care about.

$Response$JobStarted
[1] "2021-12-27T00:46:36.000+0000"

$Response$JobFinished
[1] "2021-12-27T00:46:49.000+0000"

$Response$TotalCount
[1] 2080                # Ignore this -- I only provided you with first 5 rows

$Response$ValidCount
[1] 2080

$Response$InvalidCount
[1] 0

$Response$ProcessedCount
[1] 2080

$Response$PendingCount
[1] 0

$Response$SuccessCount
[1] 2076

$Response$ErrorCount
[1] 4

脚本:

# I stayed with GET request via httr, but no reason you can't switch to some other
# method for download like cURL

 COMPLETED_JOB <-
    GET(paste0(url, "/", reqID, "/result?apiKey=", HERE_API_KEY))
    
job_content <- content(x = COMPLETED_JOB, as = "raw")  # This extract hexidecimal data which is the zipped content -- has to get extracted to be useful.

writeBin(job_content, con = "Processed_locations.zip")  # Writes the binary data to file.

unzip(zipfile = "Processed_locations.zip")  # Extracts the zip file as its own text file.

最终结果文件:

recId|SeqNumber|seqLength|recId|seqNumber|seqLength|displayLatitude|displayLongitude|city|postalCode|country
1|1|1|1|1|1|43.70924|-79.658|Mississauga|L4T 1G3|CAN
2|1|1|2|1|1|39.83972|-88.92881|Decatur|62521|USA
3|1|1|3|1|1|45.47659|-73.78061|Dorval|H9P 1K2|CAN
4|1|1|4|1|1|43.75666|-79.71021|Brampton|L6S 4K6|CAN
5|1|1|5|1|1|40.4013|-91.3848|Keokuk|52632|USA