在 R 中使用 xmlToDataframe 从 xml 数据返回空数据帧
Empty dataframe returned from xml data using xmlToDataframe in R
正在处理使用 GET 函数从 httr 包下载的 xml 数据。返回的内容类型为application/xml。摘录如下:
<?xml version="1.0" encoding="UTF-8"?>
<metadata xmlns="http://example.com/schema/dxf/2.0">
<pager>
<page>1</page>
<pageCount>17</pageCount>
<total>819</total>
<pageSize>50</pageSize>
<nextPage>https://xxx.org.ng/xxx/api/indicators?page=2&format=xml</nextPage>
</pager>
<indicators>
<indicator id="cfvVUWwkwje">
<displayName> ART Total </displayName>
</indicator>
<indicator id="gytvOB3J7">
<displayName> ART Microscopy - Total</displayName>
</indicator>
<indicator id="5fgtZdtvQRW">
<displayName> ART Microscopy Biology - Total</displayName>
</indicator>
<indicator id="g6hYenEHnsu">
<displayName> ART GeneXpert - Total </displayName>
</indicator>
<indicator id="hhjxxDlG87j">
<displayName> ART Functional -Total</displayName>
</indicator>
<indicator id="SarCtUBpBru">
<displayName> ART 21 - Total</displayName>
</indicator>
<indicator id="ftywhPKoMgp">
<displayName> Buruli Ulcer Total</displayName>
</indicator>
<indicator id="gyyhtAzCQZ0">
<displayName> xART 21 prophylaxis Functional -Total</displayName>
</indicator>
<indicator id="vftWafaROyq0">
<displayName> xART 21 Non Functional - Total</displayName>
</indicator>
</indicators>
</metadata>
我使用以下代码下载并尝试将 xml 转换为数据帧,如下所示:
url_xml <- modify_url(url1, path = path)
xml_response <- GET(url_xml, authenticate(username, password))
http_type(xml_response)
resp_content <- content(xml_response)
parsed_content <- xmlParse(resp_content)
# get the root
parsed_xml_root <- xmlRoot(parsed_content)
# parse out names and IDs
df_xml <- xmlToDataFrame(nodes = getNodeSet(parsed_xml_root,"//indicators/indicator/displayName"))
id <- xmlSApply(parsed_xml_root[["indicator"]], xmlGetAttr, "id")
all_values_df <- cbind(df_xml, id)
我想获取指标的id和显示名称。结果数据框是空的。请任何建议
我解决了
df_xml <- xmlToDataFrame(nodes = xmlChildren(xmlRoot(parsed_content)[["indicators"]]))
您的 xml 文件有一个关联的名称空间。所以首先你需要检索命名空间,然后将它添加到解析表达式中。
library(XML)
#get namespace
nsDefs <- xmlNamespaceDefinitions(parsed_content, simplify = FALSE)
ns <- structure(sapply(nsDefs, function(x) x$uri), names = names(nsDefs))
#rename namespace and add it the node queries
df_xml <- xmlToDataFrame(getNodeSet(parsed_content, "/x:metadata//x:indicators//x:indicator//x:displayName", c(x=ns)))
#find indicator nodes then get attribute
nodes <-getNodeSet(parsed_content, "/x:metadata//x:indicators//x:indicator", c(x=ns))
id <- xmlSApply(nodes, xmlGetAttr, "id")
#put it all together
all_values_df <- cbind(df_xml, id)
正在处理使用 GET 函数从 httr 包下载的 xml 数据。返回的内容类型为application/xml。摘录如下:
<?xml version="1.0" encoding="UTF-8"?>
<metadata xmlns="http://example.com/schema/dxf/2.0">
<pager>
<page>1</page>
<pageCount>17</pageCount>
<total>819</total>
<pageSize>50</pageSize>
<nextPage>https://xxx.org.ng/xxx/api/indicators?page=2&format=xml</nextPage>
</pager>
<indicators>
<indicator id="cfvVUWwkwje">
<displayName> ART Total </displayName>
</indicator>
<indicator id="gytvOB3J7">
<displayName> ART Microscopy - Total</displayName>
</indicator>
<indicator id="5fgtZdtvQRW">
<displayName> ART Microscopy Biology - Total</displayName>
</indicator>
<indicator id="g6hYenEHnsu">
<displayName> ART GeneXpert - Total </displayName>
</indicator>
<indicator id="hhjxxDlG87j">
<displayName> ART Functional -Total</displayName>
</indicator>
<indicator id="SarCtUBpBru">
<displayName> ART 21 - Total</displayName>
</indicator>
<indicator id="ftywhPKoMgp">
<displayName> Buruli Ulcer Total</displayName>
</indicator>
<indicator id="gyyhtAzCQZ0">
<displayName> xART 21 prophylaxis Functional -Total</displayName>
</indicator>
<indicator id="vftWafaROyq0">
<displayName> xART 21 Non Functional - Total</displayName>
</indicator>
</indicators>
</metadata>
我使用以下代码下载并尝试将 xml 转换为数据帧,如下所示:
url_xml <- modify_url(url1, path = path)
xml_response <- GET(url_xml, authenticate(username, password))
http_type(xml_response)
resp_content <- content(xml_response)
parsed_content <- xmlParse(resp_content)
# get the root
parsed_xml_root <- xmlRoot(parsed_content)
# parse out names and IDs
df_xml <- xmlToDataFrame(nodes = getNodeSet(parsed_xml_root,"//indicators/indicator/displayName"))
id <- xmlSApply(parsed_xml_root[["indicator"]], xmlGetAttr, "id")
all_values_df <- cbind(df_xml, id)
我想获取指标的id和显示名称。结果数据框是空的。请任何建议
我解决了
df_xml <- xmlToDataFrame(nodes = xmlChildren(xmlRoot(parsed_content)[["indicators"]]))
您的 xml 文件有一个关联的名称空间。所以首先你需要检索命名空间,然后将它添加到解析表达式中。
library(XML)
#get namespace
nsDefs <- xmlNamespaceDefinitions(parsed_content, simplify = FALSE)
ns <- structure(sapply(nsDefs, function(x) x$uri), names = names(nsDefs))
#rename namespace and add it the node queries
df_xml <- xmlToDataFrame(getNodeSet(parsed_content, "/x:metadata//x:indicators//x:indicator//x:displayName", c(x=ns)))
#find indicator nodes then get attribute
nodes <-getNodeSet(parsed_content, "/x:metadata//x:indicators//x:indicator", c(x=ns))
id <- xmlSApply(nodes, xmlGetAttr, "id")
#put it all together
all_values_df <- cbind(df_xml, id)