在 R 中读取多个 xml 文件并合并数据时出现工作目录错误

working directory error when read multiple xml files in R and combine the data

我正在尝试 parse/read 来自我当前数据的多个 xml 文件并尝试将它们组合在一起。

我的示例 xml 文件是这样的:

<ApplicationResponse>
    <Service Name="AlternativeCreditAttributes">
      <Categories>
        <Category Name="Default">
          <Attributes>
            <Attribute Name="ACA_ACH_NSF_12M" Value="0" />
            <Attribute Name="ACA_ACH_NSF_18M" Value="0" />
            <Attribute Name="ACA_ACH_NSF_24M" Value="0" />
            <Attribute Name="ACA_ACH_NSF_3M" Value="0" />
            <Attribute Name="ACA_ACH_NSF_6M" Value="0" />
            <Attribute Name="ACA_ACH_NSF_9M" Value="0" />
            <Attribute Name="ACA_ACH_NSF_AMT_12M" Value="" />
            <Attribute Name="ACA_ACH_NSF_AMT_18M" Value="" />
            <Attribute Name="ACA_ACH_NSF_AMT_24M" Value="" />
            <Attribute Name="ACA_ACH_NSF_AMT_3M" Value="" />
            <Attribute Name="ACA_ACH_NSF_AMT_6M" Value="" />
            <Attribute Name="ACA_ACH_NSF_AMT_9M" Value="" />
            <Attribute Name="ACA_ACH_NSF_AMT_EVER" Value="600" />
            <Attribute Name="ACA_ACH_NSF_EVER" Value="2" />
            <Attribute Name="ACA_ACH_NSF_MONTHS_SINCE_NEWEST" Value="41" />
            <Attribute Name="ACA_ACH_NSF_MONTHS_SINCE_OLDEST" Value="41" />
          </Attributes>
        </Category>
      </Categories>
    </Service>
</ApplicationResponse>

我已经根据以下代码成功拉取了一个文件:

doc<-read_xml(Data$XMLResponse[1])
  # setNames(data.frame(
    cols<- xml_attr(xml_find_all(doc, "//Attribute"), "Name")
    rows<- xml_attr(xml_find_all(doc, "//Attribute"), "Value")
  # ),
out  <- data.frame(rows, row.names = cols)
out

但是当我尝试使用 lapply 基于此 提取多个文件时,我遇到了工作目录错误。

Error: 'NA' does not exist in current working directory

下面是我使用的代码。如果您知道这个问题或者您需要有关此问题的任何详细信息,请告诉我。提前致谢。

df_list <- lapply(Data$XMLResponse, function(f) {
  doc <- read_xml(f)
  setNames(data.frame(
    xml_attr(xml_find_all(doc, "//Attribute"), "Name"),
    xml_attr(xml_find_all(doc, "//Attribute"), "Value")
  ),c("Name", f))
})

这是一种使用 for() 循环从存储在 Data$XMLResponse 中的每个 xml 文件中收集所有值的方法。代码假定每个 xml 文件都具有完全相同的 "Attributes" 长度,并且顺序相同。

library(xml2)    
#create a blank list
datalist = list()
#loop through your column of xml responses to extract the values you want.
for(i in 1:length(Data$XMLResponse)){
temp_vals<-read_xml(Data$XMLResponse[i])
temp_vals<-xml_attr(xml_find_all(temp_vals, "//Attribute"), "Value")
#assign these values to your data list
datalist[[i]]<-temp_vals
}

#bind the data from the xml files together
your_data = do.call(rbind, datalist)

然后获取列名:

your_column_names<-xml_attr(xml_find_all(Data$XMLResponse[1], "//Attribute"), "Name")
doc<-setNames(data.frame(matrix(ncol = length(your_column_names), nrow = 0)), your_column_names)

然后使用 rbind() 将您的数据与您的列名绑定

rbind(doc,your_data)