当有重复的图层名称时,如何使用 st_read 读取多个图层
How to read multiple layers with st_read when there are duplicate layer names
我有一个 kml 文件,this 的解压版本。它有数千层带有 XML 标签,其中许多层名称重复。
我想使用 sf::st_read
将其加载到 R 中。诀窍是 st_read
一次读取一层并需要一个层名称。如果它们是唯一的,我很乐意遍历使用 st_layers()
获取的图层名称,但它们不是。
是否有另一种方法来指定所需的图层,或者是否有一种方法可以批量重命名具有唯一 ID 的所有图层?
谢谢。
根据下面接受的答案添加一些颜色。最初,我尝试使用 'read_xml' 编辑 <name>
节点,但似乎没有找到它们。
我下载了 KMZ 文件,将其加载到 Google 地球,然后将其另存为 KML 文件 ("Reports.kml")。这是我的第一个错误。生成的 KML 是制表符分隔的,这让 read_xml
感到困惑。它是有效的 XML,但即使 st_
功能有效,标签也不能被 read_xml
正确识别。最好在 KMZ 文件上使用 unzip
。以下是与-Google-地球版本一起保存时发生的情况:
layers<-st_layers("reports.kml")
data_frame(name=layers$name, type=flatten_chr(layers$geomtype)) %>%
count(name, type, sort=TRUE)
# A tibble: 1,358 x 3
# name type n
# <chr> <chr> <int>
# 1 July 2006 25
# 2 October 2006 25
# 3 August 2008 20
# 4 July 2009 19
# 5 August 2005 18
# 6 August 2007 18
# 7 November 2006 18
# 8 October 2004 17
# 9 August 2000 16
#10 November 2012 16
# ... with 1,348 more rows
kml<-read_xml("reports.kml")
xml_find_all(kml, ".//Folder/name")
# {xml_nodeset (0)}
没有!但是那里有一些东西:
xml_children(kml)
# {xml_nodeset (1)}
# [1] <Folder>\n <name>Reports</name>\n <open>1</open>\n <Folder>\n
# <name>Class A</name>\n ...
这是解压后的 KMZ 发生的情况:
download.file(url="http://www.bfro.net/app/AllReportsKMZ.aspx",
destfile = "AllBFROReports.kmz",
mode="wb")
unzip("AllBFROReports.kmz",junkpaths = TRUE) #creates "doc.kml"
layers <- st_layers("doc.kml")
data_frame(name=layers$name, type=flatten_chr(layers$geomtype)) %>%
count(name, type, sort=TRUE)
# # A tibble: 1,376 x 3
# name type n
# <chr> <chr> <int>
# 1 July 2006 25
# 2 October 2006 25
# 3 August 2008 20
# 4 July 2009 19
# 5 August 2005 18
# 6 August 2007 18
# 7 November 2006 18
# 8 October 2004 17
# 9 August 2000 16
# 10 November 2012 16
# # ... with 1,366 more rows
st_layers
也是一样,现在节点都找对了!
kml <- read_xml("doc.kml")
xml_find_all(kml, ".//Folder/name")
{xml_nodeset (3874)}
[1] <name>June 2000</name>
[2] <name> 1995</name>
[3] <name>February 2004</name>
[4] <name>June 2004</name>
[5] <name>February 2004</name>
[6] <name>April 2008</name>
[7] <name>July 2009</name>
[8] <name>September 1981 and 1982</name>
[9] <name>July 1999</name>
[10] <name>November 1983</name>
[11] <name>October 2000</name>
[12] <name>August 1993</name>
[13] <name> 79, 80, 99</name>
[14] <name> 1978</name>
[15] <name>November 1980</name>
[16] <name>January 1997</name>
[17] <name> 1990</name>
[18] <name>December 1996</name>
[19] <name> 2000</name>
[20] <name> 2001</name>
...
现在下面提供的答案非常有效!
稍微 XML 手术就可以解决问题。
首先说明问题:
library(sf)
library(xml2)
library(tidyverse)
layers <- st_layers("AllBFROReports.kml")
data_frame(name=layers$name, type=flatten_chr(layers$geomtype)) %>%
count(name, type, sort=TRUE)
## # A tibble: 1,376 x 3
## name type n
## <chr> <chr> <int>
## 1 July 2006 25
## 2 October 2006 25
## 3 August 2008 20
## 4 July 2009 19
## 5 August 2005 18
## 6 August 2007 18
## 7 November 2006 18
## 8 October 2004 17
## 9 August 2000 16
## 10 November 2012 16
## # ... with 1,366 more rows
呃。一个非常卑鄙的人制作了那个文件。
现在,阅读 "raw":
kml <- read_xml("AllBFROReports.kml")
为每个图层名称添加顺序索引号:
idx <- 0
xml_find_all(kml, ".//Folder/name") %>%
walk(~{
idx <<- idx + 1
xml_text(.x) <- sprintf("%s-%s", idx, xml_text(.x))
})
创建一个新文件:
write_xml(kml, "AllBFROReports-unique.kml")
证明有效:
layers2 <- st_layers("AllBFROReports-unique.kml")
data_frame(name=layers2$name, type=flatten_chr(layers2$geomtype)) %>%
count(name, type, sort=TRUE)
## # A tibble: 3,874 x 3
## name type n
## <chr> <chr> <int>
## 1 1-June 2000 1
## 2 10-November 1983 1
## 3 100-September 1992 1
## 4 1000-October 1987 1
## 5 1001-October 1987 1
## 6 1002-October 1979 1
## 7 1003-June 1993 3D Point 1
## 8 1004- 1982 3D Point 1
## 9 1005- 1982 3D Point 1
## 10 1006-August 1977 3D Point 1
## # ... with 3,864 more rows
使用新的索引名称读入一层:
st_read("AllBFROReports-unique.kml", layer = "10-November 1983")
## Reading layer `10-November 1983' from data source `/Users/bob/Desktop/AllBFROReports-unique.kml' using driver `KML'
## Simple feature collection with 2 features and 2 fields
## geometry type: GEOMETRY
## dimension: XYZ
## bbox: xmin: -86.4677 ymin: 34.9484 xmax: -86.4441 ymax: 34.9637
## epsg (SRID): 4326
## proj4string: +proj=longlat +datum=WGS84 +no_defs
我有一个 kml 文件,this 的解压版本。它有数千层带有 XML 标签,其中许多层名称重复。
我想使用 sf::st_read
将其加载到 R 中。诀窍是 st_read
一次读取一层并需要一个层名称。如果它们是唯一的,我很乐意遍历使用 st_layers()
获取的图层名称,但它们不是。
是否有另一种方法来指定所需的图层,或者是否有一种方法可以批量重命名具有唯一 ID 的所有图层?
谢谢。
根据下面接受的答案添加一些颜色。最初,我尝试使用 'read_xml' 编辑 <name>
节点,但似乎没有找到它们。
我下载了 KMZ 文件,将其加载到 Google 地球,然后将其另存为 KML 文件 ("Reports.kml")。这是我的第一个错误。生成的 KML 是制表符分隔的,这让 read_xml
感到困惑。它是有效的 XML,但即使 st_
功能有效,标签也不能被 read_xml
正确识别。最好在 KMZ 文件上使用 unzip
。以下是与-Google-地球版本一起保存时发生的情况:
layers<-st_layers("reports.kml")
data_frame(name=layers$name, type=flatten_chr(layers$geomtype)) %>%
count(name, type, sort=TRUE)
# A tibble: 1,358 x 3
# name type n
# <chr> <chr> <int>
# 1 July 2006 25
# 2 October 2006 25
# 3 August 2008 20
# 4 July 2009 19
# 5 August 2005 18
# 6 August 2007 18
# 7 November 2006 18
# 8 October 2004 17
# 9 August 2000 16
#10 November 2012 16
# ... with 1,348 more rows
kml<-read_xml("reports.kml")
xml_find_all(kml, ".//Folder/name")
# {xml_nodeset (0)}
没有!但是那里有一些东西:
xml_children(kml)
# {xml_nodeset (1)}
# [1] <Folder>\n <name>Reports</name>\n <open>1</open>\n <Folder>\n
# <name>Class A</name>\n ...
这是解压后的 KMZ 发生的情况:
download.file(url="http://www.bfro.net/app/AllReportsKMZ.aspx",
destfile = "AllBFROReports.kmz",
mode="wb")
unzip("AllBFROReports.kmz",junkpaths = TRUE) #creates "doc.kml"
layers <- st_layers("doc.kml")
data_frame(name=layers$name, type=flatten_chr(layers$geomtype)) %>%
count(name, type, sort=TRUE)
# # A tibble: 1,376 x 3
# name type n
# <chr> <chr> <int>
# 1 July 2006 25
# 2 October 2006 25
# 3 August 2008 20
# 4 July 2009 19
# 5 August 2005 18
# 6 August 2007 18
# 7 November 2006 18
# 8 October 2004 17
# 9 August 2000 16
# 10 November 2012 16
# # ... with 1,366 more rows
st_layers
也是一样,现在节点都找对了!
kml <- read_xml("doc.kml")
xml_find_all(kml, ".//Folder/name")
{xml_nodeset (3874)}
[1] <name>June 2000</name>
[2] <name> 1995</name>
[3] <name>February 2004</name>
[4] <name>June 2004</name>
[5] <name>February 2004</name>
[6] <name>April 2008</name>
[7] <name>July 2009</name>
[8] <name>September 1981 and 1982</name>
[9] <name>July 1999</name>
[10] <name>November 1983</name>
[11] <name>October 2000</name>
[12] <name>August 1993</name>
[13] <name> 79, 80, 99</name>
[14] <name> 1978</name>
[15] <name>November 1980</name>
[16] <name>January 1997</name>
[17] <name> 1990</name>
[18] <name>December 1996</name>
[19] <name> 2000</name>
[20] <name> 2001</name>
...
现在下面提供的答案非常有效!
稍微 XML 手术就可以解决问题。
首先说明问题:
library(sf)
library(xml2)
library(tidyverse)
layers <- st_layers("AllBFROReports.kml")
data_frame(name=layers$name, type=flatten_chr(layers$geomtype)) %>%
count(name, type, sort=TRUE)
## # A tibble: 1,376 x 3
## name type n
## <chr> <chr> <int>
## 1 July 2006 25
## 2 October 2006 25
## 3 August 2008 20
## 4 July 2009 19
## 5 August 2005 18
## 6 August 2007 18
## 7 November 2006 18
## 8 October 2004 17
## 9 August 2000 16
## 10 November 2012 16
## # ... with 1,366 more rows
呃。一个非常卑鄙的人制作了那个文件。
现在,阅读 "raw":
kml <- read_xml("AllBFROReports.kml")
为每个图层名称添加顺序索引号:
idx <- 0
xml_find_all(kml, ".//Folder/name") %>%
walk(~{
idx <<- idx + 1
xml_text(.x) <- sprintf("%s-%s", idx, xml_text(.x))
})
创建一个新文件:
write_xml(kml, "AllBFROReports-unique.kml")
证明有效:
layers2 <- st_layers("AllBFROReports-unique.kml")
data_frame(name=layers2$name, type=flatten_chr(layers2$geomtype)) %>%
count(name, type, sort=TRUE)
## # A tibble: 3,874 x 3
## name type n
## <chr> <chr> <int>
## 1 1-June 2000 1
## 2 10-November 1983 1
## 3 100-September 1992 1
## 4 1000-October 1987 1
## 5 1001-October 1987 1
## 6 1002-October 1979 1
## 7 1003-June 1993 3D Point 1
## 8 1004- 1982 3D Point 1
## 9 1005- 1982 3D Point 1
## 10 1006-August 1977 3D Point 1
## # ... with 3,864 more rows
使用新的索引名称读入一层:
st_read("AllBFROReports-unique.kml", layer = "10-November 1983")
## Reading layer `10-November 1983' from data source `/Users/bob/Desktop/AllBFROReports-unique.kml' using driver `KML'
## Simple feature collection with 2 features and 2 fields
## geometry type: GEOMETRY
## dimension: XYZ
## bbox: xmin: -86.4677 ymin: 34.9484 xmax: -86.4441 ymax: 34.9637
## epsg (SRID): 4326
## proj4string: +proj=longlat +datum=WGS84 +no_defs