xml2在转换为数据帧时处理不清楚的错误
xml2 processing unclear error while transforming into dataframe
我正在尝试将 xml 文件转换为数据框,而对于某些元素,它运行良好,而对于其他元素则不然。我不知道为什么。
这是XML的简单版本:
<?xml version="1.0" encoding="UTF-8"?>
<clinical_study rank="6838">
<arm_group>
<arm_group_label>Arm I (Lozenge placebo)</arm_group_label>
<arm_group_type>Placebo Comparator</arm_group_type>
<description>Patients receive lozenge placebo PO QID.</description>
</arm_group>
<arm_group>
<arm_group_label>Arm II (LBR lozenge)</arm_group_label>
<arm_group_type>Experimental</arm_group_type>
<description>Patients receive lyophilized black raspberries lozenge PO (8gms/day)</description>
</arm_group>
<arm_group>
<arm_group_label>Arm III (Saliva Substitute placebo)</arm_group_label>
<arm_group_type>Placebo Comparator</arm_group_type>
<description>Patients receive Saliva Substitute placebo PO QID.</description>
</arm_group>
<arm_group>
<arm_group_label>Arm IV (LBR Saliva Substitute)</arm_group_label>
<arm_group_type>Experimental</arm_group_type>
<description>Patients receive lyophilized black raspberries Saliva Substitute PO (8gms/day).</description>
</arm_group>
<condition_browse>
<!-- CAUTION: The following MeSH terms are assigned with an imperfect algorithm -->
<mesh_term>Carcinoma</mesh_term>
<mesh_term>Carcinoma, Squamous Cell</mesh_term>
<mesh_term>Laryngeal Diseases</mesh_term>
<mesh_term>Laryngeal Neoplasms</mesh_term>
<mesh_term>Oropharyngeal Neoplasms</mesh_term>
<mesh_term>Carcinoma, Verrucous</mesh_term>
<mesh_term>Nasopharyngeal Neoplasms</mesh_term>
<mesh_term>Salivary Gland Neoplasms</mesh_term>
<mesh_term>Paranasal Sinus Neoplasms</mesh_term>
<mesh_term>Head and Neck Neoplasms</mesh_term>
<mesh_term>Neoplasms, Unknown Primary</mesh_term>
<mesh_term>Mouth Neoplasms</mesh_term>
<mesh_term>Hypopharyngeal Neoplasms</mesh_term>
<mesh_term>Tongue Neoplasms</mesh_term>
<mesh_term>Lip Neoplasms</mesh_term>
<mesh_term>Carcinoma in Situ</mesh_term>
</condition_browse>
<!-- Results have not yet been posted for this study -->
</clinical_study>
我正在使用的代码(工作代码):
library(XML)
library(dplyr)
library(xml2)
# read group
outc <- xml_find_all(xml, "//arm_group") %>% as_list() %>% dplyr::bind_rows() %>% as.data.frame()
这段代码不起作用:
test1 <- xml_find_all(xml, "//condition_browse") %>% as_list() %>% dplyr::bind_rows() %>% as.data.frame()
第二段代码生成一个只有 1 行的数据集,而不是预期的多行数据帧。
我无法确定错误是来自我的 xml2 语法、xpath 语法还是来自 xml 数据。
可以支持一下吗?
condition_browse
下的所有节点都标记为:"mesh_term"。 bind_rows
正在合并类似命名的行,因此只保存最后一个行。
尝试使用
temp <- xml_find_all(xml, "//condition_browse") %>% as_list() %>% unlist()
#convert into data frame
test1 <-data.frame(names=names(temp), value=temp)
这将提供略有不同的格式,但应该为您的其余分析提供一个良好的开端。
我正在尝试将 xml 文件转换为数据框,而对于某些元素,它运行良好,而对于其他元素则不然。我不知道为什么。
这是XML的简单版本:
<?xml version="1.0" encoding="UTF-8"?>
<clinical_study rank="6838">
<arm_group>
<arm_group_label>Arm I (Lozenge placebo)</arm_group_label>
<arm_group_type>Placebo Comparator</arm_group_type>
<description>Patients receive lozenge placebo PO QID.</description>
</arm_group>
<arm_group>
<arm_group_label>Arm II (LBR lozenge)</arm_group_label>
<arm_group_type>Experimental</arm_group_type>
<description>Patients receive lyophilized black raspberries lozenge PO (8gms/day)</description>
</arm_group>
<arm_group>
<arm_group_label>Arm III (Saliva Substitute placebo)</arm_group_label>
<arm_group_type>Placebo Comparator</arm_group_type>
<description>Patients receive Saliva Substitute placebo PO QID.</description>
</arm_group>
<arm_group>
<arm_group_label>Arm IV (LBR Saliva Substitute)</arm_group_label>
<arm_group_type>Experimental</arm_group_type>
<description>Patients receive lyophilized black raspberries Saliva Substitute PO (8gms/day).</description>
</arm_group>
<condition_browse>
<!-- CAUTION: The following MeSH terms are assigned with an imperfect algorithm -->
<mesh_term>Carcinoma</mesh_term>
<mesh_term>Carcinoma, Squamous Cell</mesh_term>
<mesh_term>Laryngeal Diseases</mesh_term>
<mesh_term>Laryngeal Neoplasms</mesh_term>
<mesh_term>Oropharyngeal Neoplasms</mesh_term>
<mesh_term>Carcinoma, Verrucous</mesh_term>
<mesh_term>Nasopharyngeal Neoplasms</mesh_term>
<mesh_term>Salivary Gland Neoplasms</mesh_term>
<mesh_term>Paranasal Sinus Neoplasms</mesh_term>
<mesh_term>Head and Neck Neoplasms</mesh_term>
<mesh_term>Neoplasms, Unknown Primary</mesh_term>
<mesh_term>Mouth Neoplasms</mesh_term>
<mesh_term>Hypopharyngeal Neoplasms</mesh_term>
<mesh_term>Tongue Neoplasms</mesh_term>
<mesh_term>Lip Neoplasms</mesh_term>
<mesh_term>Carcinoma in Situ</mesh_term>
</condition_browse>
<!-- Results have not yet been posted for this study -->
</clinical_study>
我正在使用的代码(工作代码):
library(XML)
library(dplyr)
library(xml2)
# read group
outc <- xml_find_all(xml, "//arm_group") %>% as_list() %>% dplyr::bind_rows() %>% as.data.frame()
这段代码不起作用:
test1 <- xml_find_all(xml, "//condition_browse") %>% as_list() %>% dplyr::bind_rows() %>% as.data.frame()
第二段代码生成一个只有 1 行的数据集,而不是预期的多行数据帧。
我无法确定错误是来自我的 xml2 语法、xpath 语法还是来自 xml 数据。
可以支持一下吗?
condition_browse
下的所有节点都标记为:"mesh_term"。 bind_rows
正在合并类似命名的行,因此只保存最后一个行。
尝试使用
temp <- xml_find_all(xml, "//condition_browse") %>% as_list() %>% unlist()
#convert into data frame
test1 <-data.frame(names=names(temp), value=temp)
这将提供略有不同的格式,但应该为您的其余分析提供一个良好的开端。