如何使用R语言获取XML树中指定节点的值
how to get value of a specified node in XML tree using R language
我有一个 xml 文件,其中包含具有处理值的属性的根节点和子节点。
我正在使用 R 语言处理 xml 文件。
我需要的是显示 IT 部门员工的结果
如何显示IT部门员工的ID或姓名?
我使用了这个代码:
print(getNodeSet(rootnode,"//EMPLOYEE/DEPT[@DEPT='IT']"))
其中 rootnode 是处理值的变量:RECORDS
没用
xml 文件:
<RECORDS>
<EMPLOYEE>
<ID>1</ID>
<NAME>Rick</NAME>
<SALARY>623.3</SALARY>
<STARTDATE>1/1/2012</STARTDATE>
<DEPT>IT</DEPT>
</EMPLOYEE>
<EMPLOYEE>
<ID>2</ID>
<NAME>Dan</NAME>
<SALARY>515.2</SALARY>
<STARTDATE>9/23/2013</STARTDATE>
<DEPT>Operations</DEPT>
</EMPLOYEE>
<EMPLOYEE>
<ID>3</ID>
<NAME>Michelle</NAME>
<SALARY>611</SALARY>
<STARTDATE>11/15/2014</STARTDATE>
<DEPT>IT</DEPT>
</EMPLOYEE>
<EMPLOYEE>
<ID>4</ID>
<NAME>Ryan</NAME>
<SALARY>729</SALARY>
<STARTDATE>5/11/2014</STARTDATE>
<DEPT>HR</DEPT>
</EMPLOYEE>
<EMPLOYEE>
<ID>5</ID>
<NAME>Gary</NAME>
<SALARY>843.25</SALARY>
<STARTDATE>3/27/2015</STARTDATE>
<DEPT>Finance</DEPT>
</EMPLOYEE>
<EMPLOYEE>
<ID>6</ID>
<NAME>Nina</NAME>
<SALARY>578</SALARY>
<STARTDATE>5/21/2013</STARTDATE>
<DEPT>IT</DEPT>
</EMPLOYEE>
<EMPLOYEE>
<ID>7</ID>
<NAME>Simon</NAME>
<SALARY>632.8</SALARY>
<STARTDATE>7/30/2013</STARTDATE>
<DEPT>Operations</DEPT>
</EMPLOYEE>
<EMPLOYEE>
<ID>8</ID>
<NAME>Guru</NAME>
<SALARY>722.5</SALARY>
<STARTDATE>6/17/2014</STARTDATE>
<DEPT>Finance</DEPT>
</EMPLOYEE>
</RECORDS>
看来您需要如下修改 getNodeSet
。
getNodeSet(xml_data, "//EMPLOYEE[DEPT='IT']/NAME")
如果您想在输出中包含多个列:
library(XML)
library(dplyr)
#sample data
xml_data <- xmlParse("<RECORDS>
<EMPLOYEE><ID>1</ID><NAME>Rick</NAME><SALARY>623.3</SALARY><DEPT>IT</DEPT></EMPLOYEE>
<EMPLOYEE><ID>2</ID><NAME>Dan</NAME><SALARY>515.2</SALARY><DEPT>Operations</DEPT></EMPLOYEE>
<EMPLOYEE><ID>3</ID><NAME>Michelle</NAME><SALARY>611</SALARY><DEPT>IT</DEPT></EMPLOYEE>
</RECORDS>")
df <- xmlToDataFrame(nodes=getNodeSet(xml_data, "//EMPLOYEE[DEPT='IT']")) %>%
select(NAME, SALARY)
df
输出为:
NAME SALARY
1 Rick 623.3
2 Michelle 611
(编辑 - 修改代码以在输出中包含多列)
我有一个 xml 文件,其中包含具有处理值的属性的根节点和子节点。
我正在使用 R 语言处理 xml 文件。
我需要的是显示 IT 部门员工的结果
如何显示IT部门员工的ID或姓名?
我使用了这个代码:
print(getNodeSet(rootnode,"//EMPLOYEE/DEPT[@DEPT='IT']"))
其中 rootnode 是处理值的变量:RECORDS
没用
xml 文件:
<RECORDS>
<EMPLOYEE>
<ID>1</ID>
<NAME>Rick</NAME>
<SALARY>623.3</SALARY>
<STARTDATE>1/1/2012</STARTDATE>
<DEPT>IT</DEPT>
</EMPLOYEE>
<EMPLOYEE>
<ID>2</ID>
<NAME>Dan</NAME>
<SALARY>515.2</SALARY>
<STARTDATE>9/23/2013</STARTDATE>
<DEPT>Operations</DEPT>
</EMPLOYEE>
<EMPLOYEE>
<ID>3</ID>
<NAME>Michelle</NAME>
<SALARY>611</SALARY>
<STARTDATE>11/15/2014</STARTDATE>
<DEPT>IT</DEPT>
</EMPLOYEE>
<EMPLOYEE>
<ID>4</ID>
<NAME>Ryan</NAME>
<SALARY>729</SALARY>
<STARTDATE>5/11/2014</STARTDATE>
<DEPT>HR</DEPT>
</EMPLOYEE>
<EMPLOYEE>
<ID>5</ID>
<NAME>Gary</NAME>
<SALARY>843.25</SALARY>
<STARTDATE>3/27/2015</STARTDATE>
<DEPT>Finance</DEPT>
</EMPLOYEE>
<EMPLOYEE>
<ID>6</ID>
<NAME>Nina</NAME>
<SALARY>578</SALARY>
<STARTDATE>5/21/2013</STARTDATE>
<DEPT>IT</DEPT>
</EMPLOYEE>
<EMPLOYEE>
<ID>7</ID>
<NAME>Simon</NAME>
<SALARY>632.8</SALARY>
<STARTDATE>7/30/2013</STARTDATE>
<DEPT>Operations</DEPT>
</EMPLOYEE>
<EMPLOYEE>
<ID>8</ID>
<NAME>Guru</NAME>
<SALARY>722.5</SALARY>
<STARTDATE>6/17/2014</STARTDATE>
<DEPT>Finance</DEPT>
</EMPLOYEE>
</RECORDS>
看来您需要如下修改 getNodeSet
。
getNodeSet(xml_data, "//EMPLOYEE[DEPT='IT']/NAME")
如果您想在输出中包含多个列:
library(XML)
library(dplyr)
#sample data
xml_data <- xmlParse("<RECORDS>
<EMPLOYEE><ID>1</ID><NAME>Rick</NAME><SALARY>623.3</SALARY><DEPT>IT</DEPT></EMPLOYEE>
<EMPLOYEE><ID>2</ID><NAME>Dan</NAME><SALARY>515.2</SALARY><DEPT>Operations</DEPT></EMPLOYEE>
<EMPLOYEE><ID>3</ID><NAME>Michelle</NAME><SALARY>611</SALARY><DEPT>IT</DEPT></EMPLOYEE>
</RECORDS>")
df <- xmlToDataFrame(nodes=getNodeSet(xml_data, "//EMPLOYEE[DEPT='IT']")) %>%
select(NAME, SALARY)
df
输出为:
NAME SALARY
1 Rick 623.3
2 Michelle 611
(编辑 - 修改代码以在输出中包含多列)