仅在第一次使用时使用 for 循环进行地理编码,并放入数据框中(在 R 中)
geocoding with for loop only the first used, and put in the dataframe (in R)
现在是我需要帮助的时候了,因为我已经尽了一切努力来解决我的 FOR 循环的问题。
我想使用 API 从地址进行地理编码,我使用了一个非常清晰的函数和更多数据框来安排每个步骤的结果并检查是否存在问题,但现在我找不到更多...
addresses: 是我的数据框 address 列,结果将放在那里
"address_ID","address","accuracy","lon_geop","lat_geop","address_geop","geopID","success"
1,"4 Kiricheneck 9990"
2,"10 Kiricheneck 9990"
3,"26 Kiricheneck 9990""
4,"27 Kiricheneck 9990"
5,"6 Avenue D'oberkorn 4640"
代码:
plcUrl <- "https://apiv3.geoportail.lu/geocode/search?queryString="
getGeoDetails <- function(address)
{
query <- paste(addresses$address)
strurl <- as.character(paste(plcUrl,query))
rd <- fromJSON(URLencode(strurl))
df <- data.frame(matrix(unlist(rd), nrow = 22, byrow = T),stringsAsFactors = FALSE)
colnames(df)[1] <- "results_geop"
answer <- data.frame(lat = NA, lon = NA, accuray = NA, address_geop = NA, success = NA, geopID = NA)
answer$status <- df$results_geop[22]
#return Na's if we didn't get a match
if (df$results_geop[22] != "TRUE")
{
return(answer)
}
#else, extract what we need from the GeoPortail server reply into a dataframe
answer$lat <- df$results_geop[9]
answer$lon <- df$results_geop[8]
answer$accuracy <- df$results_geop[21]
answer$geopID <- df$results_geop[19]
answer$address_geop <- df$results_geop[6]
answer$success <- df$results_geop[22]
return(answer)
}
#initialise a dataframe to hold the results
geocoded <- data.frame()
startindex <- 1
row_addresses <- as.numeric(rownames(addresses))
# Start the geocoding process - address by address
for (j in startindex:row_addresses)
{
#query the GeoPortail geocoder
result = getGeoDetails(addresses[j])
print(result$status)
result$index <- j
#append the answer to the results file
geocoded <- rbind(geocoded, result)
#now we add all the results to the main data
addresses$lat_geop[j] <- geocoded$lat[j]
addresses$lon_geop[j] <- geocoded$lon[j]
addresses$accuracy[j] <- geocoded$accuracy[j]
addresses$address_geop[j] <- geocoded$address_geop[j]
addresses$geopID[j] <- geocoded$geopID[j]
addresses$success[j] <- geocoded$success[j]
return(j)}
最后:
警告信息:
在 startindex:row_addresses 中:
数值表达式有 5 个元素:只有第一个被使用
而且地址数据框只有第一行是好的结果,其他都是空的。
我试过:
- 索引:for(i in 1:x)
- 为结果和循环(索引 i)构建空数据框:d[i, ] = c(x, y, z)
- 中断命令
- 下一个命令
还没有任何帮助...我的其他 for 循环可以完成这项工作,所以非常令人沮丧。
问题是由 j in startindex:row_addresses
引起的,row_addresses
是矢量而不是单个数字。
for 循环旨在从 startindex
到 row_addresses
运行,但由于 row_addresses
不是单个数字,因此仅考虑 Vector 的第一个元素。
因为 startindex
和 row_addresses[1]
都是 1,所以循环只是 运行s 1 次。
请参阅以下代码以获取示例以及您可以使用的解决方案
data_test <- data.frame(A = 1:10, B = 21:30)
startindex <- 1
row_addresses <- as.numeric(rownames(data_test))
row_addresses
> row_addresses
[1] 1 2 3 4 5 6 7 8 9 10
# Problem: only the first element of row_addresses is used in the for loop
# So the loop runs from 1 to 1 - it stops after that
for(i in startindex:row_addresses)
{
print(i)
}
> for(i in startindex:row_addresses)
+ {
+ print(i)
+
+ }
[1] 1
Warning message:
In startindex:row_addresses :
numerical expression has 10 elements: only the first used
for(i in startindex:NROW(data_test))
{
print(i)
}
> for(i in startindex:NROW(data_test))
+ {
+ print(i)
+
+ }
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
[1] 6
[1] 7
[1] 8
[1] 9
[1] 10
让我们做一些有用的数据:
data.frame(
address_ID = 1:5,
address = c(
"4 Kiricheneck 9990", "10 Kiricheneck 9990",
"26 Kiricheneck 9990", "27 Kiricheneck 9990",
"6 Avenue D'oberkorn 4640"
),
stringsAsFactors = FALSE
) -> xdf
现在,让我们为该端点制作一个合适的 API 包装器:
geoportail_geocode <- function(query) {
suppressPackageStartupMessages({ # this makes it self-contained and quiet
library(httr, warn.conflicts = FALSE, quietly = TRUE, verbose = FALSE)
library(jsonlite, warn.conflicts = FALSE, quietly = TRUE, verbose = FALSE)
})
`%||%` <- function(x, y) { if (length(x)) x else y } # this makes the code below less 'if-else'y
httr::GET(
url = "https://apiv3.geoportail.lu/geocode/search",
httr::user_agent("geoportail_geocode R function used by me@example.com"), # you should add your email to this string
query = list(
queryString = query[1]
)
) -> res
httr::stop_for_status(res) # halts on API/network errors; you may not want this but it's standard practice in API packages
out <- httr::content(res, as = "text", encoding = "UTF-8")
out <- jsonlite::fromJSON(out)
if (length(out$success) && out$success) { # if the return looks valid
# MAKES A MAJOR ASSUMPTION A Point IS BEING RETURNED
# YOU SHOULD DO A *TON* MORE VALIDATION AND ERROR CHECKING
ret <- out$results[,c("ratio", "name", "easting", "address", "northing", "matching street", "accuracy")]
ret <- cbind.data.frame(ret, out$results$AddressDetails)
ret$type <- out$results$geomlonlat$type %||% NA_character_
ret$lng <- out$results$geomlonlat$coordinates[[1]][1] %||% NA_real_
ret$lat <- out$results$geomlonlat$coordinates[[1]][2] %||% NA_real_
ret$geom <- out$results$geom$type %||% NA_character_
ret$geom_x <- out$results$geom$coordinates[[1]][1] %||% NA_real_
ret$geom_y <- out$results$geom$coordinates[[1]][2] %||% NA_real_
ret
} else {
warning("Error in geocoding")
data.frame(stringsAsFactors = FALSE)
}
}
我们会做一个:
str(geoportail_geocode(xdf$address[1]))
## 'data.frame': 1 obs. of 19 variables:
## $ ratio : num 1
## $ name : chr "4,Kiricheneck 9990 Weiswampach"
## $ easting : num 73344
## $ address : chr "4 Kiricheneck,9990 Weiswampach"
## $ northing : num 133788
## $ matching street : chr "Kiricheneck"
## $ accuracy : int 8
## $ zip : chr "9990"
## $ locality : chr "Weiswampach"
## $ id_caclr_street : chr "8188"
## $ street : chr "Kiricheneck"
## $ postnumber : chr "4"
## $ id_caclr_building: chr "181679"
## $ type : chr "Point"
## $ lng : num 6.08
## $ lat : num 50.1
## $ geom : chr "Point"
## $ geom_x : num 73344
## $ geom_y : num 133788
并使用 tidyverse
完成所有操作并避免 for
像瘟疫一样的循环 b/c 这不是 Java 或恶心 Python:
str(dplyr::bind_cols(
xdf,
purrr::map_df(xdf$address, geoportail_geocode)
))
## 'data.frame': 5 obs. of 21 variables:
## $ address_ID : int 1 2 3 4 5
## $ address : chr "4 Kiricheneck 9990" "10 Kiricheneck 9990" "26 Kiricheneck 9990" "27 Kiricheneck 9990" ...
## $ ratio : num 1 1 1 1 1
## $ name : chr "4,Kiricheneck 9990 Weiswampach" "10,Kiricheneck 9990 Weiswampach" "26,Kiricheneck 9990 Weiswampach" "27,Kiricheneck 9990 Weiswampach" ...
## $ easting : num 73344 73280 73203 73241 60462
## $ address1 : chr "4 Kiricheneck,9990 Weiswampach" "10 Kiricheneck,9990 Weiswampach" "26 Kiricheneck,9990 Weiswampach" "27 Kiricheneck,9990 Weiswampach" ...
## $ northing : num 133788 133732 133622 133591 65234
## $ matching street : chr "Kiricheneck" "Kiricheneck" "Kiricheneck" "Kiricheneck" ...
## $ accuracy : int 8 8 8 8 8
## $ zip : chr "9990" "9990" "9990" "9990" ...
## $ locality : chr "Weiswampach" "Weiswampach" "Weiswampach" "Weiswampach" ...
## $ id_caclr_street : chr "8188" "8188" "8188" "8188" ...
## $ street : chr "Kiricheneck" "Kiricheneck" "Kiricheneck" "Kiricheneck" ...
## $ postnumber : chr "4" "10" "26" "27" ...
## $ id_caclr_building: chr "181679" "181752" "181672" "181668" ...
## $ type : chr "Point" "Point" "Point" "Point" ...
## $ lng : num 6.08 6.07 6.07 6.07 5.9
## $ lat : num 50.1 50.1 50.1 50.1 49.5
## $ geom : chr "Point" "Point" "Point" "Point" ...
## $ geom_x : num 73344 73280 73203 73241 60462
## $ geom_y : num 133788 133732 133622 133591 65234
如功能代码中所述,stop_for_status
调用将终止该功能,因此您可能需要 warn_for_status
,检查响应的状态代码和 return 一个空的data.frame(stringsAsFactors=FALSE)
.
现在是我需要帮助的时候了,因为我已经尽了一切努力来解决我的 FOR 循环的问题。 我想使用 API 从地址进行地理编码,我使用了一个非常清晰的函数和更多数据框来安排每个步骤的结果并检查是否存在问题,但现在我找不到更多...
addresses: 是我的数据框 address 列,结果将放在那里
"address_ID","address","accuracy","lon_geop","lat_geop","address_geop","geopID","success"
1,"4 Kiricheneck 9990"
2,"10 Kiricheneck 9990"
3,"26 Kiricheneck 9990""
4,"27 Kiricheneck 9990"
5,"6 Avenue D'oberkorn 4640"
代码:
plcUrl <- "https://apiv3.geoportail.lu/geocode/search?queryString="
getGeoDetails <- function(address)
{
query <- paste(addresses$address)
strurl <- as.character(paste(plcUrl,query))
rd <- fromJSON(URLencode(strurl))
df <- data.frame(matrix(unlist(rd), nrow = 22, byrow = T),stringsAsFactors = FALSE)
colnames(df)[1] <- "results_geop"
answer <- data.frame(lat = NA, lon = NA, accuray = NA, address_geop = NA, success = NA, geopID = NA)
answer$status <- df$results_geop[22]
#return Na's if we didn't get a match
if (df$results_geop[22] != "TRUE")
{
return(answer)
}
#else, extract what we need from the GeoPortail server reply into a dataframe
answer$lat <- df$results_geop[9]
answer$lon <- df$results_geop[8]
answer$accuracy <- df$results_geop[21]
answer$geopID <- df$results_geop[19]
answer$address_geop <- df$results_geop[6]
answer$success <- df$results_geop[22]
return(answer)
}
#initialise a dataframe to hold the results
geocoded <- data.frame()
startindex <- 1
row_addresses <- as.numeric(rownames(addresses))
# Start the geocoding process - address by address
for (j in startindex:row_addresses)
{
#query the GeoPortail geocoder
result = getGeoDetails(addresses[j])
print(result$status)
result$index <- j
#append the answer to the results file
geocoded <- rbind(geocoded, result)
#now we add all the results to the main data
addresses$lat_geop[j] <- geocoded$lat[j]
addresses$lon_geop[j] <- geocoded$lon[j]
addresses$accuracy[j] <- geocoded$accuracy[j]
addresses$address_geop[j] <- geocoded$address_geop[j]
addresses$geopID[j] <- geocoded$geopID[j]
addresses$success[j] <- geocoded$success[j]
return(j)}
最后: 警告信息: 在 startindex:row_addresses 中: 数值表达式有 5 个元素:只有第一个被使用
而且地址数据框只有第一行是好的结果,其他都是空的。 我试过:
- 索引:for(i in 1:x)
- 为结果和循环(索引 i)构建空数据框:d[i, ] = c(x, y, z)
- 中断命令
- 下一个命令
还没有任何帮助...我的其他 for 循环可以完成这项工作,所以非常令人沮丧。
问题是由 j in startindex:row_addresses
引起的,row_addresses
是矢量而不是单个数字。
for 循环旨在从 startindex
到 row_addresses
运行,但由于 row_addresses
不是单个数字,因此仅考虑 Vector 的第一个元素。
因为 startindex
和 row_addresses[1]
都是 1,所以循环只是 运行s 1 次。
请参阅以下代码以获取示例以及您可以使用的解决方案
data_test <- data.frame(A = 1:10, B = 21:30)
startindex <- 1
row_addresses <- as.numeric(rownames(data_test))
row_addresses
> row_addresses
[1] 1 2 3 4 5 6 7 8 9 10
# Problem: only the first element of row_addresses is used in the for loop
# So the loop runs from 1 to 1 - it stops after that
for(i in startindex:row_addresses)
{
print(i)
}
> for(i in startindex:row_addresses)
+ {
+ print(i)
+
+ }
[1] 1
Warning message:
In startindex:row_addresses :
numerical expression has 10 elements: only the first used
for(i in startindex:NROW(data_test))
{
print(i)
}
> for(i in startindex:NROW(data_test))
+ {
+ print(i)
+
+ }
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
[1] 6
[1] 7
[1] 8
[1] 9
[1] 10
让我们做一些有用的数据:
data.frame(
address_ID = 1:5,
address = c(
"4 Kiricheneck 9990", "10 Kiricheneck 9990",
"26 Kiricheneck 9990", "27 Kiricheneck 9990",
"6 Avenue D'oberkorn 4640"
),
stringsAsFactors = FALSE
) -> xdf
现在,让我们为该端点制作一个合适的 API 包装器:
geoportail_geocode <- function(query) {
suppressPackageStartupMessages({ # this makes it self-contained and quiet
library(httr, warn.conflicts = FALSE, quietly = TRUE, verbose = FALSE)
library(jsonlite, warn.conflicts = FALSE, quietly = TRUE, verbose = FALSE)
})
`%||%` <- function(x, y) { if (length(x)) x else y } # this makes the code below less 'if-else'y
httr::GET(
url = "https://apiv3.geoportail.lu/geocode/search",
httr::user_agent("geoportail_geocode R function used by me@example.com"), # you should add your email to this string
query = list(
queryString = query[1]
)
) -> res
httr::stop_for_status(res) # halts on API/network errors; you may not want this but it's standard practice in API packages
out <- httr::content(res, as = "text", encoding = "UTF-8")
out <- jsonlite::fromJSON(out)
if (length(out$success) && out$success) { # if the return looks valid
# MAKES A MAJOR ASSUMPTION A Point IS BEING RETURNED
# YOU SHOULD DO A *TON* MORE VALIDATION AND ERROR CHECKING
ret <- out$results[,c("ratio", "name", "easting", "address", "northing", "matching street", "accuracy")]
ret <- cbind.data.frame(ret, out$results$AddressDetails)
ret$type <- out$results$geomlonlat$type %||% NA_character_
ret$lng <- out$results$geomlonlat$coordinates[[1]][1] %||% NA_real_
ret$lat <- out$results$geomlonlat$coordinates[[1]][2] %||% NA_real_
ret$geom <- out$results$geom$type %||% NA_character_
ret$geom_x <- out$results$geom$coordinates[[1]][1] %||% NA_real_
ret$geom_y <- out$results$geom$coordinates[[1]][2] %||% NA_real_
ret
} else {
warning("Error in geocoding")
data.frame(stringsAsFactors = FALSE)
}
}
我们会做一个:
str(geoportail_geocode(xdf$address[1]))
## 'data.frame': 1 obs. of 19 variables:
## $ ratio : num 1
## $ name : chr "4,Kiricheneck 9990 Weiswampach"
## $ easting : num 73344
## $ address : chr "4 Kiricheneck,9990 Weiswampach"
## $ northing : num 133788
## $ matching street : chr "Kiricheneck"
## $ accuracy : int 8
## $ zip : chr "9990"
## $ locality : chr "Weiswampach"
## $ id_caclr_street : chr "8188"
## $ street : chr "Kiricheneck"
## $ postnumber : chr "4"
## $ id_caclr_building: chr "181679"
## $ type : chr "Point"
## $ lng : num 6.08
## $ lat : num 50.1
## $ geom : chr "Point"
## $ geom_x : num 73344
## $ geom_y : num 133788
并使用 tidyverse
完成所有操作并避免 for
像瘟疫一样的循环 b/c 这不是 Java 或恶心 Python:
str(dplyr::bind_cols(
xdf,
purrr::map_df(xdf$address, geoportail_geocode)
))
## 'data.frame': 5 obs. of 21 variables:
## $ address_ID : int 1 2 3 4 5
## $ address : chr "4 Kiricheneck 9990" "10 Kiricheneck 9990" "26 Kiricheneck 9990" "27 Kiricheneck 9990" ...
## $ ratio : num 1 1 1 1 1
## $ name : chr "4,Kiricheneck 9990 Weiswampach" "10,Kiricheneck 9990 Weiswampach" "26,Kiricheneck 9990 Weiswampach" "27,Kiricheneck 9990 Weiswampach" ...
## $ easting : num 73344 73280 73203 73241 60462
## $ address1 : chr "4 Kiricheneck,9990 Weiswampach" "10 Kiricheneck,9990 Weiswampach" "26 Kiricheneck,9990 Weiswampach" "27 Kiricheneck,9990 Weiswampach" ...
## $ northing : num 133788 133732 133622 133591 65234
## $ matching street : chr "Kiricheneck" "Kiricheneck" "Kiricheneck" "Kiricheneck" ...
## $ accuracy : int 8 8 8 8 8
## $ zip : chr "9990" "9990" "9990" "9990" ...
## $ locality : chr "Weiswampach" "Weiswampach" "Weiswampach" "Weiswampach" ...
## $ id_caclr_street : chr "8188" "8188" "8188" "8188" ...
## $ street : chr "Kiricheneck" "Kiricheneck" "Kiricheneck" "Kiricheneck" ...
## $ postnumber : chr "4" "10" "26" "27" ...
## $ id_caclr_building: chr "181679" "181752" "181672" "181668" ...
## $ type : chr "Point" "Point" "Point" "Point" ...
## $ lng : num 6.08 6.07 6.07 6.07 5.9
## $ lat : num 50.1 50.1 50.1 50.1 49.5
## $ geom : chr "Point" "Point" "Point" "Point" ...
## $ geom_x : num 73344 73280 73203 73241 60462
## $ geom_y : num 133788 133732 133622 133591 65234
如功能代码中所述,stop_for_status
调用将终止该功能,因此您可能需要 warn_for_status
,检查响应的状态代码和 return 一个空的data.frame(stringsAsFactors=FALSE)
.