在 R 中读取多个 xml 文件并合并数据时出现工作目录错误
working directory error when read multiple xml files in R and combine the data
我正在尝试 parse/read 来自我当前数据的多个 xml 文件并尝试将它们组合在一起。
我的示例 xml 文件是这样的:
<ApplicationResponse>
<Service Name="AlternativeCreditAttributes">
<Categories>
<Category Name="Default">
<Attributes>
<Attribute Name="ACA_ACH_NSF_12M" Value="0" />
<Attribute Name="ACA_ACH_NSF_18M" Value="0" />
<Attribute Name="ACA_ACH_NSF_24M" Value="0" />
<Attribute Name="ACA_ACH_NSF_3M" Value="0" />
<Attribute Name="ACA_ACH_NSF_6M" Value="0" />
<Attribute Name="ACA_ACH_NSF_9M" Value="0" />
<Attribute Name="ACA_ACH_NSF_AMT_12M" Value="" />
<Attribute Name="ACA_ACH_NSF_AMT_18M" Value="" />
<Attribute Name="ACA_ACH_NSF_AMT_24M" Value="" />
<Attribute Name="ACA_ACH_NSF_AMT_3M" Value="" />
<Attribute Name="ACA_ACH_NSF_AMT_6M" Value="" />
<Attribute Name="ACA_ACH_NSF_AMT_9M" Value="" />
<Attribute Name="ACA_ACH_NSF_AMT_EVER" Value="600" />
<Attribute Name="ACA_ACH_NSF_EVER" Value="2" />
<Attribute Name="ACA_ACH_NSF_MONTHS_SINCE_NEWEST" Value="41" />
<Attribute Name="ACA_ACH_NSF_MONTHS_SINCE_OLDEST" Value="41" />
</Attributes>
</Category>
</Categories>
</Service>
</ApplicationResponse>
我已经根据以下代码成功拉取了一个文件:
doc<-read_xml(Data$XMLResponse[1])
# setNames(data.frame(
cols<- xml_attr(xml_find_all(doc, "//Attribute"), "Name")
rows<- xml_attr(xml_find_all(doc, "//Attribute"), "Value")
# ),
out <- data.frame(rows, row.names = cols)
out
但是当我尝试使用 lapply
基于此 提取多个文件时,我遇到了工作目录错误。
Error: 'NA' does not exist in current working directory
下面是我使用的代码。如果您知道这个问题或者您需要有关此问题的任何详细信息,请告诉我。提前致谢。
df_list <- lapply(Data$XMLResponse, function(f) {
doc <- read_xml(f)
setNames(data.frame(
xml_attr(xml_find_all(doc, "//Attribute"), "Name"),
xml_attr(xml_find_all(doc, "//Attribute"), "Value")
),c("Name", f))
})
这是一种使用 for()
循环从存储在 Data$XMLResponse 中的每个 xml 文件中收集所有值的方法。代码假定每个 xml 文件都具有完全相同的 "Attributes" 长度,并且顺序相同。
library(xml2)
#create a blank list
datalist = list()
#loop through your column of xml responses to extract the values you want.
for(i in 1:length(Data$XMLResponse)){
temp_vals<-read_xml(Data$XMLResponse[i])
temp_vals<-xml_attr(xml_find_all(temp_vals, "//Attribute"), "Value")
#assign these values to your data list
datalist[[i]]<-temp_vals
}
#bind the data from the xml files together
your_data = do.call(rbind, datalist)
然后获取列名:
your_column_names<-xml_attr(xml_find_all(Data$XMLResponse[1], "//Attribute"), "Name")
doc<-setNames(data.frame(matrix(ncol = length(your_column_names), nrow = 0)), your_column_names)
然后使用 rbind()
将您的数据与您的列名绑定
rbind(doc,your_data)
我正在尝试 parse/read 来自我当前数据的多个 xml 文件并尝试将它们组合在一起。
我的示例 xml 文件是这样的:
<ApplicationResponse>
<Service Name="AlternativeCreditAttributes">
<Categories>
<Category Name="Default">
<Attributes>
<Attribute Name="ACA_ACH_NSF_12M" Value="0" />
<Attribute Name="ACA_ACH_NSF_18M" Value="0" />
<Attribute Name="ACA_ACH_NSF_24M" Value="0" />
<Attribute Name="ACA_ACH_NSF_3M" Value="0" />
<Attribute Name="ACA_ACH_NSF_6M" Value="0" />
<Attribute Name="ACA_ACH_NSF_9M" Value="0" />
<Attribute Name="ACA_ACH_NSF_AMT_12M" Value="" />
<Attribute Name="ACA_ACH_NSF_AMT_18M" Value="" />
<Attribute Name="ACA_ACH_NSF_AMT_24M" Value="" />
<Attribute Name="ACA_ACH_NSF_AMT_3M" Value="" />
<Attribute Name="ACA_ACH_NSF_AMT_6M" Value="" />
<Attribute Name="ACA_ACH_NSF_AMT_9M" Value="" />
<Attribute Name="ACA_ACH_NSF_AMT_EVER" Value="600" />
<Attribute Name="ACA_ACH_NSF_EVER" Value="2" />
<Attribute Name="ACA_ACH_NSF_MONTHS_SINCE_NEWEST" Value="41" />
<Attribute Name="ACA_ACH_NSF_MONTHS_SINCE_OLDEST" Value="41" />
</Attributes>
</Category>
</Categories>
</Service>
</ApplicationResponse>
我已经根据以下代码成功拉取了一个文件:
doc<-read_xml(Data$XMLResponse[1])
# setNames(data.frame(
cols<- xml_attr(xml_find_all(doc, "//Attribute"), "Name")
rows<- xml_attr(xml_find_all(doc, "//Attribute"), "Value")
# ),
out <- data.frame(rows, row.names = cols)
out
但是当我尝试使用 lapply
基于此
Error: 'NA' does not exist in current working directory
下面是我使用的代码。如果您知道这个问题或者您需要有关此问题的任何详细信息,请告诉我。提前致谢。
df_list <- lapply(Data$XMLResponse, function(f) {
doc <- read_xml(f)
setNames(data.frame(
xml_attr(xml_find_all(doc, "//Attribute"), "Name"),
xml_attr(xml_find_all(doc, "//Attribute"), "Value")
),c("Name", f))
})
这是一种使用 for()
循环从存储在 Data$XMLResponse 中的每个 xml 文件中收集所有值的方法。代码假定每个 xml 文件都具有完全相同的 "Attributes" 长度,并且顺序相同。
library(xml2)
#create a blank list
datalist = list()
#loop through your column of xml responses to extract the values you want.
for(i in 1:length(Data$XMLResponse)){
temp_vals<-read_xml(Data$XMLResponse[i])
temp_vals<-xml_attr(xml_find_all(temp_vals, "//Attribute"), "Value")
#assign these values to your data list
datalist[[i]]<-temp_vals
}
#bind the data from the xml files together
your_data = do.call(rbind, datalist)
然后获取列名:
your_column_names<-xml_attr(xml_find_all(Data$XMLResponse[1], "//Attribute"), "Name")
doc<-setNames(data.frame(matrix(ncol = length(your_column_names), nrow = 0)), your_column_names)
然后使用 rbind()
将您的数据与您的列名绑定
rbind(doc,your_data)