识别列表中所有可能的 parents 和 children
Identify all possible parents and children across lists
我有 80,000 个 XML 文件据称使用相同的格式。然而,事实显然并非如此。因此,我试图识别文件中的所有现有节点和 children。
我已经使用 XML 包将 XML-files 作为列表导入,下面我描述了我的输入和我想要的输出。
输入(列表列表):
XML1 <- list(name = "Company Number 1",
adress = list(street = "JP Street", number = "12"),
product = "chicken")
XML2 <- list(name = "Company Number 2",
company_adress = list(street = "House Street", number = "93"),
invoice = list(quantity = "2", product = "phone"))
XML3 <- list(company_name = "Company Number 3",
adress = list(street = "Lake Street", number = "1"),
invoice = list(quantity = "2", product = "phone", list(note = "Phones are refurbished")))
输出(跨文件的树结构,叶子出现次数):
List of 5
$ name : num 2
$ company_name : num 1
$ adress :List of 2
..$ street: num 2
..$ number: num 2
$ company_adress:List of 2
..$ street: num 1
..$ number: num 1
$ invoice :List of 3
..$ quantity: num 2
..$ product : num 2
..$ :List of 1
.. ..$ note: num 1
$ product : num 1
是否有一个包可以做这件事,或者我需要自己写一个函数来做这件事吗?
我编写了一个递归循环来解决这个问题。它并不优雅,但它确实有效。
该函数接受一个嵌套列表和一个空向量。
# Summary tree for storing results
summary_tree <- list()
# Function
tree_merger <- function(tree, position) {
# Testing if at the leaf of a tree
if (is.character(tree) | is.null(tree)) {
print("DONE")
} else {
# Position in tree
if (length(position) == 0) {
# Names of nodes
tree_names <- names(tree)
# Adding one to each name
for (i in 1:length(tree_names)) {
if (is.null(summary_tree[[tree_names[i]]])) {
summary_tree[[tree_names[i]]] <<- list(1)
} else {
summary_tree[[tree_names[i]]] <<- list(summary_tree[[tree_names[i]]][[1]] + 1)
}
# Running function on new tree
tree_merger(tree[[tree_names[i]]], c(position, tree_names[i]))
}
} else {
# Names of nodes
tree_names <- names(tree)
# Finding position in tree to save information
position_string <- NULL
for (p in position) {
position_string <- paste(position_string, "[[\"", p, "\"]]", sep = "")
}
position_string <- paste("summary_tree", position_string, sep = "")
# Adding one to each position
for (i in 1:length(tree_names)) {
position_string_full <<- paste(position_string, "[[\"", tree_names[i], "\"]]", sep = "")
# Adding to position
if(is.null(eval(parse(text=position_string_full)))) {
eval(parse(text=paste(position_string_full, "<<- list(1)")))
} else {
eval(parse(text=paste(position_string_full, "<<- list(", position_string_full ,"[[1]] + 1)")))
}
# Running function on new tree
tree_merger(tree[[tree_names[i]]], c(position, tree_names[i]))
}
}
}
}
如果有人 运行 遇到同样的问题,应该注意有关如何退出递归的代码可能应该更改。对于我的 XML 文件,所有 "leafs" 都以字符串或 NULL 结尾。在其他列表列表中,它可能是其他类型的值。
我有 80,000 个 XML 文件据称使用相同的格式。然而,事实显然并非如此。因此,我试图识别文件中的所有现有节点和 children。
我已经使用 XML 包将 XML-files 作为列表导入,下面我描述了我的输入和我想要的输出。
输入(列表列表):
XML1 <- list(name = "Company Number 1",
adress = list(street = "JP Street", number = "12"),
product = "chicken")
XML2 <- list(name = "Company Number 2",
company_adress = list(street = "House Street", number = "93"),
invoice = list(quantity = "2", product = "phone"))
XML3 <- list(company_name = "Company Number 3",
adress = list(street = "Lake Street", number = "1"),
invoice = list(quantity = "2", product = "phone", list(note = "Phones are refurbished")))
输出(跨文件的树结构,叶子出现次数):
List of 5
$ name : num 2
$ company_name : num 1
$ adress :List of 2
..$ street: num 2
..$ number: num 2
$ company_adress:List of 2
..$ street: num 1
..$ number: num 1
$ invoice :List of 3
..$ quantity: num 2
..$ product : num 2
..$ :List of 1
.. ..$ note: num 1
$ product : num 1
是否有一个包可以做这件事,或者我需要自己写一个函数来做这件事吗?
我编写了一个递归循环来解决这个问题。它并不优雅,但它确实有效。
该函数接受一个嵌套列表和一个空向量。
# Summary tree for storing results
summary_tree <- list()
# Function
tree_merger <- function(tree, position) {
# Testing if at the leaf of a tree
if (is.character(tree) | is.null(tree)) {
print("DONE")
} else {
# Position in tree
if (length(position) == 0) {
# Names of nodes
tree_names <- names(tree)
# Adding one to each name
for (i in 1:length(tree_names)) {
if (is.null(summary_tree[[tree_names[i]]])) {
summary_tree[[tree_names[i]]] <<- list(1)
} else {
summary_tree[[tree_names[i]]] <<- list(summary_tree[[tree_names[i]]][[1]] + 1)
}
# Running function on new tree
tree_merger(tree[[tree_names[i]]], c(position, tree_names[i]))
}
} else {
# Names of nodes
tree_names <- names(tree)
# Finding position in tree to save information
position_string <- NULL
for (p in position) {
position_string <- paste(position_string, "[[\"", p, "\"]]", sep = "")
}
position_string <- paste("summary_tree", position_string, sep = "")
# Adding one to each position
for (i in 1:length(tree_names)) {
position_string_full <<- paste(position_string, "[[\"", tree_names[i], "\"]]", sep = "")
# Adding to position
if(is.null(eval(parse(text=position_string_full)))) {
eval(parse(text=paste(position_string_full, "<<- list(1)")))
} else {
eval(parse(text=paste(position_string_full, "<<- list(", position_string_full ,"[[1]] + 1)")))
}
# Running function on new tree
tree_merger(tree[[tree_names[i]]], c(position, tree_names[i]))
}
}
}
}
如果有人 运行 遇到同样的问题,应该注意有关如何退出递归的代码可能应该更改。对于我的 XML 文件,所有 "leafs" 都以字符串或 NULL 结尾。在其他列表列表中,它可能是其他类型的值。