使用 R 为 xml 文件中的所有节点提取具有相同名称的属性

Question

我正在尝试提取 xml 文件中的所有属性（具有相同的名称）。目前正在使用 xml2 包，并希望在 xml_attr 或 xml_attrs 功能上取得成功。

library(xml2)

# basic xml file
x <- read_xml("<a>
  <b><c>123</c></b>
  <b><c>456</c></b>
</a>")

# add a few attributes with the same name of "Fake ID"
xml_set_attr(xml_child(x, 'b[1]'), 'FakeID', '11111')
xml_set_attr(xml_child(x, 'b[2]'), 'FakeID', '22222')
xml_set_attr(xml_child(xml_child(x, 'b[2]'), 'c'), 'FakeID', '33333')

# this will give me attributes only when I call a specific child node
xml_attr(xml_child(x, 'b[1]'), 'FakeID')
# this does not give me any attributes with the name "FakeID" because the current node
#   doesn't have that attribute
xml_attr(x, 'FakeID')

我最终希望的是一个向量，它给出 xml 中具有属性 "FakeID" 的每个节点的值； c('11111', '22222', '33333')

Answer 1

我使用包 rvest 因为它有 re-exports xml2 功能，还有 re-exports %>% 运算符。然后我将你的 xml 变成了一个字符串，以明确其中的内容，并向你的第一个节点添加了第二个属性。

在 xml_nodes() 中 select 所有具有 * css select 的节点或指定我只想要具有 [= FakeID 属性的节点16=].

library(rvest)

"<a>
   <b FakeID=\"11111\" RealID=\"abcde\">
     <c>123</c>
   </b>
   <b FakeID=\"22222\">
     <c FakeID=\"33333\">456</c>
   </b>
</a>" %>% 
  read_xml() %>% 
  xml_nodes("*[FakeID]") %>% 
  xml_attrs() %>% 
  pluck("FakeID") %>% 
  unlist()

使用 R 为 xml 文件中的所有节点提取具有相同名称的属性

Extract attributes with same name for all nodes in an xml file using R

r

xml2