如何在 R 中正确读取 KML 文件,或将集中变量分离到列中
How to read in KML file properly in R, or separate out lumped variables into columns
我使用以下内容读入了 KML 文件:
clinics = st_read(dsn = "Data/clinics-kml.kml","CLINICS")
但是,我的所有变量(坐标除外)都集中在 Description
下的 1 列中(见下文 link)。
分离变量的最佳方法是什么?或者,有没有办法正确导入 KML 文件来避免此问题? You may view the screenshot of the problem here.
想出了一个替代方法,即使用 QGIS 将 KML 转换为 SHP。然后将其作为 SHP 读入 R.
问题(或可能不是)是 Description 列有一个 html table 作为每个观察的字符串。如果您想解析 html 字符串并获得一个漂亮的 table 就可以了,例如在创建交互式网络地图时。但如果你只是想要里面的数据,这可能会让人头疼。
因此,只需按照以下步骤即可在 R 中完成所有过程:
- 从互联网下载 KML 文件
- 解压缩下载的文件
- 将 KML 文件作为空间对象读取
- 获取每个观察的属性
- 将属性绑定到每个观察作为新列
所有代码都有注释,见下:
library(tidyverse)
library(sf)
library(mapview)
library(rvest)
library(httr)
# 1) Download the kml file
moh_chas_clinics <- GET("https://data.gov.sg/dataset/31e92629-980d-4672-af33-cec147c18102/download",
write_disk(here::here("moh_chas_clinics.zip"), overwrite = TRUE))
# 2) Unzip the downloaded zip file
unzip(here::here("moh_chas_clinics.zip"))
# 3) Read the KML file as a Spatial object
moh_chas_clinics <- read_sf(here::here("chas-clinics-kml.kml"))
# Watch data
moh_chas_clinics %>%
glimpse()
# See map
mapview(moh_chas_clinics)
# 4) Get the attributes for each observation
# Option a) Using a simple lapply
attributes <- lapply(X = 1:nrow(moh_chas_clinics),
FUN = function(x) {
moh_chas_clinics %>%
slice(x) %>%
pull(Description) %>%
read_html() %>%
html_node("table") %>%
html_table(header = TRUE, trim = TRUE, dec = ".", fill = TRUE) %>%
as_tibble(.name_repair = ~ make.names(c("Attribute", "Value"))) %>%
pivot_wider(names_from = Attribute, values_from = Value)
})
# Option b) Using a Parallel lapply (faster)
future::plan("multisession")
attributes <- future.apply::future_lapply(X = 1:nrow(moh_chas_clinics),
FUN = function(x) {
moh_chas_clinics %>%
slice(x) %>%
pull(Description) %>%
read_html() %>%
html_node("table") %>%
html_table(header = TRUE, trim = TRUE, dec = ".", fill = TRUE) %>%
as_tibble(.name_repair = ~ make.names(c("Attribute", "Value"))) %>%
pivot_wider(names_from = Attribute, values_from = Value)
})
# 5) Bind the attributes to each observation as new columns
moh_chas_clinics_attr <-
moh_chas_clinics %>%
bind_cols(bind_rows(attributes)) %>%
select(-Description)
# Watch new data
moh_chas_clinics_attr %>%
glimpse()
# New map
mapview(moh_chas_clinics_attr,
zcol = "CLINIC_PROGRAMME_CODE",
layer.name = "Clinic Programme Code")
以最终地图为例,显示了一个点的所有属性并按 "Clinic Programme Code":
着色
我使用以下内容读入了 KML 文件:
clinics = st_read(dsn = "Data/clinics-kml.kml","CLINICS")
但是,我的所有变量(坐标除外)都集中在 Description
下的 1 列中(见下文 link)。
分离变量的最佳方法是什么?或者,有没有办法正确导入 KML 文件来避免此问题? You may view the screenshot of the problem here.
想出了一个替代方法,即使用 QGIS 将 KML 转换为 SHP。然后将其作为 SHP 读入 R.
问题(或可能不是)是 Description 列有一个 html table 作为每个观察的字符串。如果您想解析 html 字符串并获得一个漂亮的 table 就可以了,例如在创建交互式网络地图时。但如果你只是想要里面的数据,这可能会让人头疼。
因此,只需按照以下步骤即可在 R 中完成所有过程:
- 从互联网下载 KML 文件
- 解压缩下载的文件
- 将 KML 文件作为空间对象读取
- 获取每个观察的属性
- 将属性绑定到每个观察作为新列
所有代码都有注释,见下:
library(tidyverse)
library(sf)
library(mapview)
library(rvest)
library(httr)
# 1) Download the kml file
moh_chas_clinics <- GET("https://data.gov.sg/dataset/31e92629-980d-4672-af33-cec147c18102/download",
write_disk(here::here("moh_chas_clinics.zip"), overwrite = TRUE))
# 2) Unzip the downloaded zip file
unzip(here::here("moh_chas_clinics.zip"))
# 3) Read the KML file as a Spatial object
moh_chas_clinics <- read_sf(here::here("chas-clinics-kml.kml"))
# Watch data
moh_chas_clinics %>%
glimpse()
# See map
mapview(moh_chas_clinics)
# 4) Get the attributes for each observation
# Option a) Using a simple lapply
attributes <- lapply(X = 1:nrow(moh_chas_clinics),
FUN = function(x) {
moh_chas_clinics %>%
slice(x) %>%
pull(Description) %>%
read_html() %>%
html_node("table") %>%
html_table(header = TRUE, trim = TRUE, dec = ".", fill = TRUE) %>%
as_tibble(.name_repair = ~ make.names(c("Attribute", "Value"))) %>%
pivot_wider(names_from = Attribute, values_from = Value)
})
# Option b) Using a Parallel lapply (faster)
future::plan("multisession")
attributes <- future.apply::future_lapply(X = 1:nrow(moh_chas_clinics),
FUN = function(x) {
moh_chas_clinics %>%
slice(x) %>%
pull(Description) %>%
read_html() %>%
html_node("table") %>%
html_table(header = TRUE, trim = TRUE, dec = ".", fill = TRUE) %>%
as_tibble(.name_repair = ~ make.names(c("Attribute", "Value"))) %>%
pivot_wider(names_from = Attribute, values_from = Value)
})
# 5) Bind the attributes to each observation as new columns
moh_chas_clinics_attr <-
moh_chas_clinics %>%
bind_cols(bind_rows(attributes)) %>%
select(-Description)
# Watch new data
moh_chas_clinics_attr %>%
glimpse()
# New map
mapview(moh_chas_clinics_attr,
zcol = "CLINIC_PROGRAMME_CODE",
layer.name = "Clinic Programme Code")
以最终地图为例,显示了一个点的所有属性并按 "Clinic Programme Code":
着色