下载地名
Downloading Geonames
我有兴趣为加拿大下载 Lake Geonames。最大限度。每天可以下载的行数是 1000。当我 运行 下面的代码时,很少有记录被遗漏,有些记录重叠。有没有办法获取可用的湖泊地名记录总数,并且只下载一次记录,没有任何重叠?
library(geonames); GN_lake <- GNsearch(featureCode='LK', country='CA',startRow=1,maxRows = 1000)
GN_lake <- GNsearch(featureCode='LK', country='CA',startRow=1000, maxRows=1000)
为什么不直接在本地使用 CA 数据库?
library(httr)
library(tidyverse)
# Get CA database
httr::GET(
url = "http://download.geonames.org/export/dump/CA.zip",
httr::write_disk("CA.zip"),
httr::progress()
) -> res
# unzip it
unzip("CA.zip")
read.csv( # readr::read_tsv doesn't like this file at least when I read it
file = "CA.txt",
header = FALSE,
sep = "\t",
col.names = c(
"geonameid", "name", "asciiname", "alternatenames", "latitude",
"longitude", "feature_class", "feature_code", "country", "cc2",
"admin1_code1", "admin2_code", "admin3_code", "admin4_code",
"population", "elevation", "dem", "timezone", "modification_date"
),
stringsAsFactors = FALSE
) %>% tbl_df() -> ca_geo
filter(ca_geo, feature_code == "LK")
## # A tibble: 104,663 x 19
## geonameid name asciiname alternatenames latitude longitude
## <int> <chr> <chr> <chr> <dbl> <dbl>
## 1 5881640 101 Mile Lake 101 Mile Lake "" 51.7 -121.
## 2 5881642 103 Mile Lake 103 Mile Lake "" 51.7 -121.
## 3 5881644 105 Mile Lake 105 Mile Lake "" 51.7 -121.
## 4 5881647 108 Mile Lake 108 Mile Lake "" 51.7 -121.
## 5 5881660 130 Mile Lake 130 Mile Lake "" 51.9 -122.
## 6 5881666 16 1/2 Mile … 16 1/2 Mile … "" 52.7 -118.
## 7 5881668 180 Lake 180 Lake "" 57.4 -130.
## 8 5881673 {1}útsaw Lake {1}utsaw Lake "" 62.7 -137.
## 9 5881680 24 Mile Lake 24 Mile Lake "" 46.5 -82.0
## 10 5881683 28 Mile Lake 28 Mile Lake "" 54.8 -124.
## # ... with 104,653 more rows, and 13 more variables: feature_class <chr>,
## # feature_code <chr>, country <chr>, cc2 <chr>, admin1_code1 <int>,
## # admin2_code <chr>, admin3_code <int>, admin4_code <chr>,
## # population <int>, elevation <int>, dem <int>, timezone <chr>,
## # modification_date <chr>
我有兴趣为加拿大下载 Lake Geonames。最大限度。每天可以下载的行数是 1000。当我 运行 下面的代码时,很少有记录被遗漏,有些记录重叠。有没有办法获取可用的湖泊地名记录总数,并且只下载一次记录,没有任何重叠?
library(geonames); GN_lake <- GNsearch(featureCode='LK', country='CA',startRow=1,maxRows = 1000)
GN_lake <- GNsearch(featureCode='LK', country='CA',startRow=1000, maxRows=1000)
为什么不直接在本地使用 CA 数据库?
library(httr)
library(tidyverse)
# Get CA database
httr::GET(
url = "http://download.geonames.org/export/dump/CA.zip",
httr::write_disk("CA.zip"),
httr::progress()
) -> res
# unzip it
unzip("CA.zip")
read.csv( # readr::read_tsv doesn't like this file at least when I read it
file = "CA.txt",
header = FALSE,
sep = "\t",
col.names = c(
"geonameid", "name", "asciiname", "alternatenames", "latitude",
"longitude", "feature_class", "feature_code", "country", "cc2",
"admin1_code1", "admin2_code", "admin3_code", "admin4_code",
"population", "elevation", "dem", "timezone", "modification_date"
),
stringsAsFactors = FALSE
) %>% tbl_df() -> ca_geo
filter(ca_geo, feature_code == "LK")
## # A tibble: 104,663 x 19
## geonameid name asciiname alternatenames latitude longitude
## <int> <chr> <chr> <chr> <dbl> <dbl>
## 1 5881640 101 Mile Lake 101 Mile Lake "" 51.7 -121.
## 2 5881642 103 Mile Lake 103 Mile Lake "" 51.7 -121.
## 3 5881644 105 Mile Lake 105 Mile Lake "" 51.7 -121.
## 4 5881647 108 Mile Lake 108 Mile Lake "" 51.7 -121.
## 5 5881660 130 Mile Lake 130 Mile Lake "" 51.9 -122.
## 6 5881666 16 1/2 Mile … 16 1/2 Mile … "" 52.7 -118.
## 7 5881668 180 Lake 180 Lake "" 57.4 -130.
## 8 5881673 {1}útsaw Lake {1}utsaw Lake "" 62.7 -137.
## 9 5881680 24 Mile Lake 24 Mile Lake "" 46.5 -82.0
## 10 5881683 28 Mile Lake 28 Mile Lake "" 54.8 -124.
## # ... with 104,653 more rows, and 13 more variables: feature_class <chr>,
## # feature_code <chr>, country <chr>, cc2 <chr>, admin1_code1 <int>,
## # admin2_code <chr>, admin3_code <int>, admin4_code <chr>,
## # population <int>, elevation <int>, dem <int>, timezone <chr>,
## # modification_date <chr>