下载地名

Downloading Geonames

我有兴趣为加拿大下载 Lake Geonames。最大限度。每天可以下载的行数是 1000。当我 运行 下面的代码时,很少有记录被遗漏,有些记录重叠。有没有办法获取可用的湖泊地名记录总数,并且只下载一次记录,没有任何重叠?

library(geonames); GN_lake <- GNsearch(featureCode='LK', country='CA',startRow=1,maxRows = 1000) 

GN_lake <- GNsearch(featureCode='LK', country='CA',startRow=1000, maxRows=1000)

为什么不直接在本地使用 CA 数据库?

library(httr)
library(tidyverse)

# Get CA database
httr::GET(
  url = "http://download.geonames.org/export/dump/CA.zip",
  httr::write_disk("CA.zip"),
  httr::progress()
) -> res

# unzip it
unzip("CA.zip")

read.csv( # readr::read_tsv doesn't like this file at least when I read it
  file = "CA.txt",
  header = FALSE,
  sep = "\t",
  col.names = c(
    "geonameid", "name", "asciiname", "alternatenames", "latitude",
    "longitude", "feature_class", "feature_code", "country", "cc2",
    "admin1_code1", "admin2_code", "admin3_code", "admin4_code",
    "population", "elevation", "dem", "timezone", "modification_date"
  ),
  stringsAsFactors = FALSE
) %>% tbl_df() -> ca_geo

filter(ca_geo, feature_code == "LK")
## # A tibble: 104,663 x 19
##    geonameid name          asciiname     alternatenames latitude longitude
##        <int> <chr>         <chr>         <chr>             <dbl>     <dbl>
##  1   5881640 101 Mile Lake 101 Mile Lake ""                 51.7    -121. 
##  2   5881642 103 Mile Lake 103 Mile Lake ""                 51.7    -121. 
##  3   5881644 105 Mile Lake 105 Mile Lake ""                 51.7    -121. 
##  4   5881647 108 Mile Lake 108 Mile Lake ""                 51.7    -121. 
##  5   5881660 130 Mile Lake 130 Mile Lake ""                 51.9    -122. 
##  6   5881666 16 1/2 Mile … 16 1/2 Mile … ""                 52.7    -118. 
##  7   5881668 180 Lake      180 Lake      ""                 57.4    -130. 
##  8   5881673 {1}útsaw Lake {1}utsaw Lake ""                 62.7    -137. 
##  9   5881680 24 Mile Lake  24 Mile Lake  ""                 46.5     -82.0
## 10   5881683 28 Mile Lake  28 Mile Lake  ""                 54.8    -124. 
## # ... with 104,653 more rows, and 13 more variables: feature_class <chr>,
## #   feature_code <chr>, country <chr>, cc2 <chr>, admin1_code1 <int>,
## #   admin2_code <chr>, admin3_code <int>, admin4_code <chr>,
## #   population <int>, elevation <int>, dem <int>, timezone <chr>,
## #   modification_date <chr>