在 tibble 上使用 lapply 的错误从双精度转换为逻辑
error using lapply on tibble convert from double to logical
编辑:看起来这是 known issue 使用“层叠”方法。 return 第一次尝试后的 NA 值不喜欢在后续方法 return lat/lons.
时转换为双精度值
数据:我有一个需要地理编码的地址列表。我正在使用 lapply()
进行拆分-应用-组合,这很有效,但速度非常慢。我想拆分(进一步)-apply-combine 是 returning 关于暗淡名称和大小的错误让我感到困惑。
# example data
library(dplyr)
library(tidygeocoder)
url <- "https://www.briandunning.com/sample-data/us-500.zip"
download.file(url = url, destfile = basename(url))
adds <- readr::read_csv(basename(url)) %>%
select(address, city,
county, state, zip) %>%
mutate(date = seq.Date(as.Date('2015-01-01'), to = Sys.Date(), length.out = 500)) %>%
mutate(year = lubridate::year(date)) %>%
# to keep it small
sample_n(20)
这可行,按年份拆分地址,将 tidygeocoder
函数应用于 return lat/lons,然后重新组合。
adds_by_year <- adds %>% split(.$year)
geo_list <- lapply(adds_by_year, function(x) {
geo <- geocode(.tbl = x,
street = address,
city = city,
county = county,
state = state,
postalcode = zip,
# cascade method uses all options (census, osm, etc)
# takes longer but may be more accurate
method = "cascade", timeout = 500) %>%
filter(!is.na(lat))
return(geo)
})
out <- bind_rows(geo_list)
以下没有:
adds <- adds %>%
mutate(yrmn = zoo::as.yearmon(date))
adds_by_yrm <- adds %>% split(.$yrmn)
geo_list <- lapply(adds_by_yrm, function(x) {
geo <- geocode(.tbl = x,
street = address,
city = city,
county = county,
state = state,
postalcode = zip,
# cascade method uses all options (census, osm, etc)
# takes longer but may be more accurate
method = "cascade", timeout = 500) %>%
filter(!is.na(lat))
return(geo)
})
out <- bind_rows(geo_list)
Returns 这个错误:
Error: Assigned data `retry_results` must be compatible with existing data.
ℹ Error occurred for column `lat`.
x Can't convert from <double> to <logical> due to loss of precision.
* Locations: 1.
Run `rlang::last_error()` to see where the error occurred.
我进行了一些搜索并找到了 this,但是建议的解决方案——将 x 包装在 as.data.frame()
中导致了同样的错误。
任何见解表示赞赏。我研究过使用 purrr
,但不确定我是否完全理解。
这是完整的回溯,我对它还不够熟悉,无法完整解析:
Backtrace:
█
1. ├─base::lapply(...)
2. │ └─global::FUN(X[[i]], ...)
3. │ └─tidygeocoder::geocode(...)
4. │ ├─base::do.call(geo, geo_args)
5. │ └─(function (address = NULL, street = NULL, city = NULL, county = NULL, ...
6. │ ├─base::do.call(geo_cascade, all_args[!names(all_args) %in% c("method")])
7. │ └─(function (..., cascade_order = c("census", "osm")) ...
8. │ ├─base::`[<-`(...)
9. │ └─tibble:::`[<-.tbl_df`(...)
10. │ └─tibble:::tbl_subassign(x, i, j, value, i_arg, j_arg, substitute(value))
11. │ └─tibble:::tbl_subassign_row(x, i, value, value_arg)
12. │ ├─base::withCallingHandlers(...)
13. │ └─vctrs::`vec_slice<-`(`*tmp*`, i, value = value[[j]])
14. │ └─(function () ...
15. │ └─vctrs:::vec_cast.logical.double(...)
16. │ └─vctrs::maybe_lossy_cast(out, x, to, lossy, x_arg = x_arg, to_arg = to_arg)
17. │ ├─base::withRestarts(...)
18. │ │ └─base:::withOneRestart(expr, restarts[[1L]])
19. │ │ └─base:::doWithOneRestart(return(expr), restart)
20. │ └─vctrs:::stop_lossy_cast(...)
21. │ └─vctrs:::stop_vctrs(...)
22. │ └─rlang::abort(message, class = c(class, "vctrs_error"), ...)
23. │ └─rlang:::signal_abort(cnd)
24. │ └─base::signalCondition(cnd)
25. └─(function (cnd) ...
正在使用 dplyr
1.0.6
dplyr::bind_rows(geo_list)
# A tibble: 8 x 11
address city county state zip date year yrmn lat long geo_method
<chr> <chr> <chr> <chr> <chr> <date> <dbl> <yearmon> <dbl> <dbl> <chr>
1 134 Lewis Rd Nashville Davidson TN 37211 2016-11-06 2016 Nov 2016 36.2 -86.8 osm
2 6651 Municipal Rd Houma Terrebonne LA 70360 2017-02-03 2017 Feb 2017 29.6 -90.7 osm
3 189 Village Park Rd Crestview Okaloosa FL 32536 2017-08-25 2017 Aug 2017 30.8 -86.6 osm
4 9122 Carpenter Ave New Haven New Haven CT 06511 2018-01-14 2018 Jan 2018 41.5 -72.8 osm
5 5221 Bear Valley Rd Nashville Davidson TN 37211 2018-09-17 2018 Sep 2018 36.1 -86.8 osm
6 28 S 7th St #2824 Englewood Bergen NJ 07631 2020-03-31 2020 Mar 2020 40.9 -74.0 census
7 5 E Truman Rd Abilene Taylor TX 79602 2021-02-25 2021 Feb 2021 32.5 -99.7 osm
8 9 Front St Washington District of Columbia DC 20001 2021-05-16 2021 May 2021 38.9 -77.0 osm
注意到有一些 list
元素有 0 行。也许,我们可以删除那些 0 行元素,然后使用 bind_rows
library(purrr)
library(dplyr)
geo_list %>%
keep(~ NROW(.x) > 0) %>%
bind_rows
# A tibble: 8 x 11
address city county state zip date year yrmn lat long geo_method
<chr> <chr> <chr> <chr> <chr> <date> <dbl> <yearmon> <dbl> <dbl> <chr>
1 134 Lewis Rd Nashville Davidson TN 37211 2016-11-06 2016 Nov 2016 36.2 -86.8 osm
2 6651 Municipal Rd Houma Terrebonne LA 70360 2017-02-03 2017 Feb 2017 29.6 -90.7 osm
3 189 Village Park Rd Crestview Okaloosa FL 32536 2017-08-25 2017 Aug 2017 30.8 -86.6 osm
4 9122 Carpenter Ave New Haven New Haven CT 06511 2018-01-14 2018 Jan 2018 41.5 -72.8 osm
5 5221 Bear Valley Rd Nashville Davidson TN 37211 2018-09-17 2018 Sep 2018 36.1 -86.8 osm
6 28 S 7th St #2824 Englewood Bergen NJ 07631 2020-03-31 2020 Mar 2020 40.9 -74.0 census
7 5 E Truman Rd Abilene Taylor TX 79602 2021-02-25 2021 Feb 2021 32.5 -99.7 osm
8 9 Front St Washington District of Columbia DC 20001 2021-05-16 2021 May 2021 38.9 -77.0 osm
已解决:
- 更新
dplyr
(感谢 akrun)
- 更新
tidygeocoder
-- 事实证明问题是 bind_rows 数字结果到 NA 结果,这是在我还没有的较新版本中处理的。在这里发布我的代码是因为 geocode()
函数中有几个有用的调试标志:
adds_by_yrm <- adds %>% split(.$yrmn)
geo_list <- lapply(adds_by_yrm, function(x) {
geo <- geocode(.tbl = as.data.frame(x),
street = address,
city = city,
county = county,
state = state,
postalcode = zip,
# cascade method uses all options (census, osm, etc)
# takes longer but may be more accurate
method = "cascade",
cascade_order = c("census", "osm"),
timeout = 500,
unique_only = TRUE,
verbose = T) %>%
filter(!is.na(lat))
return(geo)
})
out <- geo_list %>%
purrr::keep(~ NROW(.x) > 0) %>%
bind_rows()
编辑:看起来这是 known issue 使用“层叠”方法。 return 第一次尝试后的 NA 值不喜欢在后续方法 return lat/lons.
时转换为双精度值数据:我有一个需要地理编码的地址列表。我正在使用 lapply()
进行拆分-应用-组合,这很有效,但速度非常慢。我想拆分(进一步)-apply-combine 是 returning 关于暗淡名称和大小的错误让我感到困惑。
# example data
library(dplyr)
library(tidygeocoder)
url <- "https://www.briandunning.com/sample-data/us-500.zip"
download.file(url = url, destfile = basename(url))
adds <- readr::read_csv(basename(url)) %>%
select(address, city,
county, state, zip) %>%
mutate(date = seq.Date(as.Date('2015-01-01'), to = Sys.Date(), length.out = 500)) %>%
mutate(year = lubridate::year(date)) %>%
# to keep it small
sample_n(20)
这可行,按年份拆分地址,将 tidygeocoder
函数应用于 return lat/lons,然后重新组合。
adds_by_year <- adds %>% split(.$year)
geo_list <- lapply(adds_by_year, function(x) {
geo <- geocode(.tbl = x,
street = address,
city = city,
county = county,
state = state,
postalcode = zip,
# cascade method uses all options (census, osm, etc)
# takes longer but may be more accurate
method = "cascade", timeout = 500) %>%
filter(!is.na(lat))
return(geo)
})
out <- bind_rows(geo_list)
以下没有:
adds <- adds %>%
mutate(yrmn = zoo::as.yearmon(date))
adds_by_yrm <- adds %>% split(.$yrmn)
geo_list <- lapply(adds_by_yrm, function(x) {
geo <- geocode(.tbl = x,
street = address,
city = city,
county = county,
state = state,
postalcode = zip,
# cascade method uses all options (census, osm, etc)
# takes longer but may be more accurate
method = "cascade", timeout = 500) %>%
filter(!is.na(lat))
return(geo)
})
out <- bind_rows(geo_list)
Returns 这个错误:
Error: Assigned data `retry_results` must be compatible with existing data.
ℹ Error occurred for column `lat`.
x Can't convert from <double> to <logical> due to loss of precision.
* Locations: 1.
Run `rlang::last_error()` to see where the error occurred.
我进行了一些搜索并找到了 this,但是建议的解决方案——将 x 包装在 as.data.frame()
中导致了同样的错误。
任何见解表示赞赏。我研究过使用 purrr
,但不确定我是否完全理解。
这是完整的回溯,我对它还不够熟悉,无法完整解析:
Backtrace:
█
1. ├─base::lapply(...)
2. │ └─global::FUN(X[[i]], ...)
3. │ └─tidygeocoder::geocode(...)
4. │ ├─base::do.call(geo, geo_args)
5. │ └─(function (address = NULL, street = NULL, city = NULL, county = NULL, ...
6. │ ├─base::do.call(geo_cascade, all_args[!names(all_args) %in% c("method")])
7. │ └─(function (..., cascade_order = c("census", "osm")) ...
8. │ ├─base::`[<-`(...)
9. │ └─tibble:::`[<-.tbl_df`(...)
10. │ └─tibble:::tbl_subassign(x, i, j, value, i_arg, j_arg, substitute(value))
11. │ └─tibble:::tbl_subassign_row(x, i, value, value_arg)
12. │ ├─base::withCallingHandlers(...)
13. │ └─vctrs::`vec_slice<-`(`*tmp*`, i, value = value[[j]])
14. │ └─(function () ...
15. │ └─vctrs:::vec_cast.logical.double(...)
16. │ └─vctrs::maybe_lossy_cast(out, x, to, lossy, x_arg = x_arg, to_arg = to_arg)
17. │ ├─base::withRestarts(...)
18. │ │ └─base:::withOneRestart(expr, restarts[[1L]])
19. │ │ └─base:::doWithOneRestart(return(expr), restart)
20. │ └─vctrs:::stop_lossy_cast(...)
21. │ └─vctrs:::stop_vctrs(...)
22. │ └─rlang::abort(message, class = c(class, "vctrs_error"), ...)
23. │ └─rlang:::signal_abort(cnd)
24. │ └─base::signalCondition(cnd)
25. └─(function (cnd) ...
正在使用 dplyr
1.0.6
dplyr::bind_rows(geo_list)
# A tibble: 8 x 11
address city county state zip date year yrmn lat long geo_method
<chr> <chr> <chr> <chr> <chr> <date> <dbl> <yearmon> <dbl> <dbl> <chr>
1 134 Lewis Rd Nashville Davidson TN 37211 2016-11-06 2016 Nov 2016 36.2 -86.8 osm
2 6651 Municipal Rd Houma Terrebonne LA 70360 2017-02-03 2017 Feb 2017 29.6 -90.7 osm
3 189 Village Park Rd Crestview Okaloosa FL 32536 2017-08-25 2017 Aug 2017 30.8 -86.6 osm
4 9122 Carpenter Ave New Haven New Haven CT 06511 2018-01-14 2018 Jan 2018 41.5 -72.8 osm
5 5221 Bear Valley Rd Nashville Davidson TN 37211 2018-09-17 2018 Sep 2018 36.1 -86.8 osm
6 28 S 7th St #2824 Englewood Bergen NJ 07631 2020-03-31 2020 Mar 2020 40.9 -74.0 census
7 5 E Truman Rd Abilene Taylor TX 79602 2021-02-25 2021 Feb 2021 32.5 -99.7 osm
8 9 Front St Washington District of Columbia DC 20001 2021-05-16 2021 May 2021 38.9 -77.0 osm
注意到有一些 list
元素有 0 行。也许,我们可以删除那些 0 行元素,然后使用 bind_rows
library(purrr)
library(dplyr)
geo_list %>%
keep(~ NROW(.x) > 0) %>%
bind_rows
# A tibble: 8 x 11
address city county state zip date year yrmn lat long geo_method
<chr> <chr> <chr> <chr> <chr> <date> <dbl> <yearmon> <dbl> <dbl> <chr>
1 134 Lewis Rd Nashville Davidson TN 37211 2016-11-06 2016 Nov 2016 36.2 -86.8 osm
2 6651 Municipal Rd Houma Terrebonne LA 70360 2017-02-03 2017 Feb 2017 29.6 -90.7 osm
3 189 Village Park Rd Crestview Okaloosa FL 32536 2017-08-25 2017 Aug 2017 30.8 -86.6 osm
4 9122 Carpenter Ave New Haven New Haven CT 06511 2018-01-14 2018 Jan 2018 41.5 -72.8 osm
5 5221 Bear Valley Rd Nashville Davidson TN 37211 2018-09-17 2018 Sep 2018 36.1 -86.8 osm
6 28 S 7th St #2824 Englewood Bergen NJ 07631 2020-03-31 2020 Mar 2020 40.9 -74.0 census
7 5 E Truman Rd Abilene Taylor TX 79602 2021-02-25 2021 Feb 2021 32.5 -99.7 osm
8 9 Front St Washington District of Columbia DC 20001 2021-05-16 2021 May 2021 38.9 -77.0 osm
已解决:
- 更新
dplyr
(感谢 akrun) - 更新
tidygeocoder
-- 事实证明问题是 bind_rows 数字结果到 NA 结果,这是在我还没有的较新版本中处理的。在这里发布我的代码是因为geocode()
函数中有几个有用的调试标志:
adds_by_yrm <- adds %>% split(.$yrmn)
geo_list <- lapply(adds_by_yrm, function(x) {
geo <- geocode(.tbl = as.data.frame(x),
street = address,
city = city,
county = county,
state = state,
postalcode = zip,
# cascade method uses all options (census, osm, etc)
# takes longer but may be more accurate
method = "cascade",
cascade_order = c("census", "osm"),
timeout = 500,
unique_only = TRUE,
verbose = T) %>%
filter(!is.na(lat))
return(geo)
})
out <- geo_list %>%
purrr::keep(~ NROW(.x) > 0) %>%
bind_rows()