运行 R 中一个文件夹中多个 xml 文件的循环
Running a loop on multiple xml files in one folder in R
我正在尝试 运行 这个脚本作为一个循环函数,因为我在一个文件夹中有超过 200 个文件,我试图在最后生成一个 CSV 文件,列出我需要的所有数据提炼。
我已经尝试了各种方法 运行 在循环中使用它,例如In R, how to extracting two values from XML file, looping over 5603 files and write to table
每当我尝试这些不同的选项时,我都会收到如下错误:
错误:XML 内容似乎不是 XML 或权限被拒绝。
但是,当我 运行 代码只选择一个文件时 运行 没问题。这些错误似乎只在我尝试将其转换为单个文件夹中多个文件的循环函数时才会发生。
这是用于单个文件的原始代码:
doc<-xmlParse("//file/path/32460004.xml")
xmldf <- xmlToDataFrame(nodes = getNodeSet(doc, "//BatRecord"))
df1 <- data.frame(xmldf)
df1 <- separate(df1, xmldf.DateTime, into = c("Date", "Time"), sep = " ")
df1$Lat <- substr(xmldf$GPS,4,12)
df1$Long <- substr(xmldf$GPS,13,25)
df_final <- data.frame(df1$xmldf.Filename, df1$Date, df1$Time, df1$xmldf.Duration, df1$xmldf.Temperature, df1$Lat, df1$Long)
colnames(df_final) <- c("Filename", "Date", "Time", "Call Duration", "Temperature", "Lat", "Long")
write.csv(df_final, "//file/path/test_file.csv")
这里是一些示例文件的link:
https://drive.google.com/drive/folders/1ZvmOEWhzlWHRl2GxZrbYY9y7YSZ5j9Fj?usp=sharing
感谢任何帮助。
这是我的版本详细信息:
platform x86_64-w64-mingw32
arch x86_64
os mingw32
system x86_64, mingw32
status
major 3
minor 6.3
year 2020
month 02
day 29
svn rev 77875
language R
version.string R version 3.6.3 (2020-02-29)
nickname Holding the Windsock
这应该可以使用 tidyverse
和 xml2
。
require(tidyverse)
require(xml2)
### Put all your xml files in a vector
my_files <- list.files("path/to/your/xml/files", full.names = TRUE)
### Read function to transform them to tibble (similar to data.frame)
read_my_xml <- function(x, path = "//BatRecord") {
tmp <- read_xml(x) # read the xml file
tmp <- tmp %>%
xml_find_first(path) %>% # select the //BatRecord node
xml_children # select all children of that node
# this extracts the text of all children
# aka the text between the > TEXT </ Tags
out <- tmp %>% xml_text
# Takes the names of the tags <NAME> ... </NAME>
names(out) <- tmp %>% xml_name
# Turns out to tibble - see
bind_rows(out)
}
### Read the files as data
dat <- map_df(my_files, read_my_xml) # map_df is similar to a loop + binding it to one tibble
### To the transformation
dat %>%
separate(DateTime, into = c("Date", "Time"), sep = " ") %>%
mutate(Lat = substr(GPS,4,12), Long = substr(GPS,13,25)) %>%
write_csv("wherever/you/want/file.txt")
我正在尝试 运行 这个脚本作为一个循环函数,因为我在一个文件夹中有超过 200 个文件,我试图在最后生成一个 CSV 文件,列出我需要的所有数据提炼。
我已经尝试了各种方法 运行 在循环中使用它,例如In R, how to extracting two values from XML file, looping over 5603 files and write to table
每当我尝试这些不同的选项时,我都会收到如下错误:
错误:XML 内容似乎不是 XML 或权限被拒绝。
但是,当我 运行 代码只选择一个文件时 运行 没问题。这些错误似乎只在我尝试将其转换为单个文件夹中多个文件的循环函数时才会发生。
这是用于单个文件的原始代码:
doc<-xmlParse("//file/path/32460004.xml")
xmldf <- xmlToDataFrame(nodes = getNodeSet(doc, "//BatRecord"))
df1 <- data.frame(xmldf)
df1 <- separate(df1, xmldf.DateTime, into = c("Date", "Time"), sep = " ")
df1$Lat <- substr(xmldf$GPS,4,12)
df1$Long <- substr(xmldf$GPS,13,25)
df_final <- data.frame(df1$xmldf.Filename, df1$Date, df1$Time, df1$xmldf.Duration, df1$xmldf.Temperature, df1$Lat, df1$Long)
colnames(df_final) <- c("Filename", "Date", "Time", "Call Duration", "Temperature", "Lat", "Long")
write.csv(df_final, "//file/path/test_file.csv")
这里是一些示例文件的link:
https://drive.google.com/drive/folders/1ZvmOEWhzlWHRl2GxZrbYY9y7YSZ5j9Fj?usp=sharing
感谢任何帮助。
这是我的版本详细信息:
platform x86_64-w64-mingw32
arch x86_64
os mingw32
system x86_64, mingw32
status
major 3
minor 6.3
year 2020
month 02
day 29
svn rev 77875
language R
version.string R version 3.6.3 (2020-02-29)
nickname Holding the Windsock
这应该可以使用 tidyverse
和 xml2
。
require(tidyverse)
require(xml2)
### Put all your xml files in a vector
my_files <- list.files("path/to/your/xml/files", full.names = TRUE)
### Read function to transform them to tibble (similar to data.frame)
read_my_xml <- function(x, path = "//BatRecord") {
tmp <- read_xml(x) # read the xml file
tmp <- tmp %>%
xml_find_first(path) %>% # select the //BatRecord node
xml_children # select all children of that node
# this extracts the text of all children
# aka the text between the > TEXT </ Tags
out <- tmp %>% xml_text
# Takes the names of the tags <NAME> ... </NAME>
names(out) <- tmp %>% xml_name
# Turns out to tibble - see
bind_rows(out)
}
### Read the files as data
dat <- map_df(my_files, read_my_xml) # map_df is similar to a loop + binding it to one tibble
### To the transformation
dat %>%
separate(DateTime, into = c("Date", "Time"), sep = " ") %>%
mutate(Lat = substr(GPS,4,12), Long = substr(GPS,13,25)) %>%
write_csv("wherever/you/want/file.txt")