在 R 中使用 xmlToDataframe 从 xml 数据返回空数据帧

Empty dataframe returned from xml data using xmlToDataframe in R

正在处理使用 GET 函数从 httr 包下载的 xml 数据。返回的内容类型为application/xml。摘录如下:

<?xml version="1.0" encoding="UTF-8"?>
<metadata xmlns="http://example.com/schema/dxf/2.0">
  <pager>
    <page>1</page>
    <pageCount>17</pageCount>
    <total>819</total>
    <pageSize>50</pageSize>
    <nextPage>https://xxx.org.ng/xxx/api/indicators?page=2&amp;format=xml</nextPage>
  </pager>
  <indicators>
    <indicator id="cfvVUWwkwje">
      <displayName> ART Total </displayName>
    </indicator>
    <indicator id="gytvOB3J7">
      <displayName> ART Microscopy - Total</displayName>
    </indicator>
    <indicator id="5fgtZdtvQRW">
      <displayName> ART Microscopy Biology - Total</displayName>
    </indicator>
    <indicator id="g6hYenEHnsu">
      <displayName> ART GeneXpert - Total </displayName>
    </indicator>
    <indicator id="hhjxxDlG87j">
      <displayName> ART Functional -Total</displayName>
    </indicator>
    <indicator id="SarCtUBpBru">
      <displayName> ART 21 - Total</displayName>
    </indicator>
    <indicator id="ftywhPKoMgp">
      <displayName> Buruli Ulcer Total</displayName>
    </indicator>
    <indicator id="gyyhtAzCQZ0">
      <displayName> xART 21 prophylaxis Functional -Total</displayName>
    </indicator>
    <indicator id="vftWafaROyq0">
      <displayName> xART 21 Non Functional - Total</displayName>
    </indicator>
    </indicators>
</metadata>

我使用以下代码下载并尝试将 xml 转换为数据帧,如下所示:

url_xml <- modify_url(url1, path = path)
xml_response <- GET(url_xml, authenticate(username, password))

http_type(xml_response)
resp_content <- content(xml_response)

parsed_content <- xmlParse(resp_content)

# get the root
parsed_xml_root <- xmlRoot(parsed_content)
# parse out names and IDs
df_xml <- xmlToDataFrame(nodes = getNodeSet(parsed_xml_root,"//indicators/indicator/displayName"))
id <- xmlSApply(parsed_xml_root[["indicator"]], xmlGetAttr, "id")
all_values_df <- cbind(df_xml, id)

我想获取指标的id和显示名称。结果数据框是空的。请任何建议

我解决了

df_xml <- xmlToDataFrame(nodes = xmlChildren(xmlRoot(parsed_content)[["indicators"]]))

您的 xml 文件有一个关联的名称空间。所以首先你需要检索命名空间,然后将它添加到解析表达式中。

library(XML)

#get namespace
nsDefs <- xmlNamespaceDefinitions(parsed_content, simplify = FALSE)
ns <- structure(sapply(nsDefs, function(x) x$uri), names = names(nsDefs))

#rename namespace and add it the node queries
df_xml <- xmlToDataFrame(getNodeSet(parsed_content, "/x:metadata//x:indicators//x:indicator//x:displayName", c(x=ns)))

#find indicator nodes then get attribute
nodes <-getNodeSet(parsed_content, "/x:metadata//x:indicators//x:indicator", c(x=ns))
id <- xmlSApply(nodes, xmlGetAttr, "id")

#put it all together
all_values_df <- cbind(df_xml, id)