R将具有分层数据的XML树解析为数据帧

R parsing XML tree with hierarchical data to dataframe

我正在尝试在 R XML-- 中解析一些 xml 文档。数据框。我想要做的是展平 XML 树,以便我在每个数据框中得到一行,child。我还希望每一行都包含来自 parent

的数据

示例:

<xml>
    <eventlist>
        <event>
            <ProcessIndex>1063</ProcessIndex>
            <Time_of_Day>2:54:20.2959537 PM</Time_of_Day>
            <Process_Name>chrome.exe</Process_Name>
            <PID>12164</PID>
            <Operation>ReadFile</Operation>
            <Result>SUCCESS</Result>
            <Detail>Offset: 1,684,224, Length: 256</Detail>
            <stack>
                <frame>
                    <depth>0</depth>
                    <address>0xfffff8038683667c</address>
                    <path>C:\WINDOWS\System32\drivers\FLTMGR.SYS</path>
                    <location>FltDecodeParameters + 0x1a6c</location>
                </frame>
                <frame>
                    <depth>1</depth>
                    <address>0xfffff80386834e13</address>
                    <path>C:\WINDOWS\System32\drivers\FLTMGR.SYS</path>
                    <location>FltDecodeParameters + 0x203</location>
                </frame>
                <frame>
                <depth>3</depth>
                    <address>0x7ffea54ffac1</address>
                    <path>C:\WINDOWS\SYSTEM32\ntdll.dll</path>
                    <location>RtlUserThreadStart + 0x21</location>
                </frame>
            </stack>
        </event>
        <event>
            <ProcessIndex>1063</ProcessIndex>
            <Time_of_Day>2:54:20.2960270 PM</Time_of_Day>
            <Process_Name>chrome.exe</Process_Name>
            <PID>12164</PID>
            <Operation>WriteFile</Operation>
            <Result>SUCCESS</Result>
            <Detail>Offset: 103,016, Length: 36</Detail>
            <stack>
                <frame>
                    <depth>0</depth>
                    <address>0xfffff8038683667c</address>
                    <path>C:\WINDOWS\System32\drivers\FLTMGR.SYS</path>
                    <location>FltDecodeParameters + 0x1a6c</location>
                </frame>
                <frame>
                    <depth>1</depth>
                    <address>0xfffff80386834e13</address>
                    <path>C:\WINDOWS\System32\drivers\FLTMGR.SYS</path>
                    <location>FltDecodeParameters + 0x203</location>
                </frame>
                <frame>
                    <depth>26</depth>
                    <address>0x7ffea54ffac1</address>
                    <path>C:\WINDOWS\SYSTEM32\ntdll.dll</path>
                    <location>RtlUserThreadStart + 0x21</location>
                </frame>
            </stack>
        </event>
    </eventlist>
</xml>

我想要得到的结果是

ProcesnIndex     Time_of_day    Proces_Name     PID     Operation   Result  depth   address     path            location
1063             2:54:20        chrome.exe      12164   ReadFile    SUCCESS 0       0xfffff..   C:\WINDOWS\System32\driv... FltDecodeParameters + 0x1a6c
1063             2:54:20        chrome.exe      12164   ReadFile    SUCCESS 1       0xfffff..   C:\WINDOWS\System32\driv... FltDecodeParameters + 0x203
1063             2:54:20        chrome.exe      12164   ReadFile    SUCCESS 2       0xfffff..   C:\WINDOWS\System32\driv... tlUserThreadStart + 0x21
1063             2:54:20        chrome.exe      12164   WriteFile   SUCCESS 0       0xfffff..   C:\WINDOWS\System32\driv... FltDecodeParameters + 0x1a6c
1063             2:54:20        chrome.exe      12164   WriteFile   SUCCESS 1       0xfffff..   C:\WINDOWS\System32\driv... FltDecodeParameters + 0x203
1063             2:54:20        chrome.exe      12164   WriteFile   SUCCESS 2       0xfffff..   C:\WINDOWS\System32\driv... RtlUserThreadStart + 0x21

我尝试使用 XML 包和 xmlToDataFrame

xmldf_events_stack <- xmlToDataFrame(nodes=getNodeSet(data_xml_2,"//eventlist/event/stack/frame"))

但这只会给我没有 parent 数据的扁平帧。此外,如果我尝试将事件数据解析为数据帧,所有 XML 标签都会从帧字段中删除,因此我以后无法解析它。

任何正确方向的帮助或指导将不胜感激

我解决了问题,我确信有更优雅的方法可以做到这一点,但这就是我所做的。希望对以后的人有所帮助

df <- do.call(rbind.fill, lapply(data_xml_2['//eventlist/event'], function(x) { 
  names <- xpathSApply(x, './/.', xmlName) 
  names <- names[which(names == "text") - 1]
  values <- xpathSApply(x, ".//text()", xmlValue)
  framevalues <- values[8:length(values)]
  framevalues <- matrix(framevalues, ncol = 4, byrow = TRUE)

  retvalues <- framevalues
  for(i in 7:1){
    retvalues <- cbind(values[i],retvalues)
  }
  colnames(retvalues) <- names[1:12] 
  return(as.data.frame(retvalues))
}))

考虑按节点索引 [##] 进行解析,然后将父项与子项合并到 lapply 中,以便将数据帧列表完全行绑定:

doc <- xmlParse("/path/to/XML/file.xml")

xml_len <- length(getNodeSet(doc,"//eventlist/event"))

dflist <- lapply(seq(xml_len), function(i){   
  # PARENT NODES   
  d1 <- transform(xmlToDataFrame(nodes=getNodeSet(doc, paste0("//eventlist/event[",i,"]"))), key=1)
  # CHILD NODES
  d2 <- transform(xmlToDataFrame(nodes=getNodeSet(doc, paste0("//eventlist/event[",i,"]/stack/frame"))), key=1) 

  # MERGE ON KEY, THEN DROP KEY
  merge(d1, d2, by="key")[-1]      
})

xmldf_events_stack <- do.call(rbind, dflist)