R:如何汇总 Data.Tree 中叶子和节点的数据?
R: How do you summarize data for both leafs and nodes in Data.Tree?
我正在使用 data.tree 结构来汇总跨文件夹的各种信息。在每个文件夹中我有一些文件(Value),我需要为每个文件夹做的是总结文件夹+所有子文件夹包含多少文件。
示例数据:
library(data.tree)
data <- data.frame(pathString = c("MainFolder",
"MainFolder/Folder1",
"MainFolder/Folder2",
"MainFolder/Folder3",
"MainFolder/Folder1/Subfolder1",
"MainFolder/Folder1/Subfolder2"),
Value = c(1,1,5,2,4,10))
tree <- as.Node(data, Value)
print(tree, "Value")
levelName Value
1 MainFolder 1
2 ¦--Folder1 1
3 ¦ ¦--Subfolder1 4
4 ¦ °--Subfolder2 10
5 ¦--Folder2 5
6 °--Folder3 2
我目前解决问题的速度非常慢:
# Function to sum up file counts pr folder + subfolders
total_count <- function(node) {
results <- sum(as.data.frame(print(node, "Value"))$Value)
return(results)
}
# Summing up file counts pr folder + subfolders
tree$Do(function(node) node$Value_by_folder <- total_count(node))
# Results
print(tree, "Value", "Value_by_folder")
levelName Value Value_by_folder
1 MainFolder 1 23
2 ¦--Folder1 1 15
3 ¦ ¦--Subfolder1 4 4
4 ¦ °--Subfolder2 10 10
5 ¦--Folder2 5 5
6 °--Folder3 2 2
关于如何更有效地执行此操作,您有什么建议吗?我一直在尝试构建递归方法,并在节点上使用函数 "isLeaf" 和 "children",但无法使其工作。
你可以这样做:
get_value_by_folder <- function(tree) {
res <- rep(NA_real_, tree$totalCount)
i <- 0
myApply <- function(node) {
i <<- i + 1
force(k <- i)
res[k] <<- node$Value + `if`(node$isLeaf, 0, sum(sapply(node$children, myApply)))
}
myApply(tree)
res
}
force
很重要,因为 R 的惰性求值会打乱您要填充的顺序 res
。
你得到:
> get_value_by_folder(tree)
[1] 23 15 4 10 5 2
编辑:如果要直接在树中填充。
get_value_by_folder2 <- function(tree) {
myApply <- function(node) {
node$Value_by_folder <- node$Value + `if`(node$isLeaf, 0, sum(sapply(node$children, myApply)))
}
myApply(tree)
tree
}
> print(get_value_by_folder2(tree), "Value", "Value_by_folder")
levelName Value Value_by_folder
1 MainFolder 1 23
2 ¦--Folder1 1 15
3 ¦ ¦--Subfolder1 4 4
4 ¦ °--Subfolder2 10 10
5 ¦--Folder2 5 5
6 °--Folder3 2 2
注意class是环境所以修改了原来的tree
> print(tree, "Value", "Value_by_folder")
levelName Value Value_by_folder
1 MainFolder 1 23
2 ¦--Folder1 1 15
3 ¦ ¦--Subfolder1 4 4
4 ¦ °--Subfolder2 10 10
5 ¦--Folder2 5 5
6 °--Folder3 2 2
这是一种有效的方法。它使用 data.tree API 并将值存储在树中:
MyAggregate <- function(node) {
if (node$isLeaf) return (node$Value)
sum(Get(node$children, "Value_by_folder")) + node$Value
}
tree$Do(function(node) node$Value_by_folder <- MyAggregate(node), traversal = "post-order")
我正在使用 data.tree 结构来汇总跨文件夹的各种信息。在每个文件夹中我有一些文件(Value),我需要为每个文件夹做的是总结文件夹+所有子文件夹包含多少文件。
示例数据:
library(data.tree)
data <- data.frame(pathString = c("MainFolder",
"MainFolder/Folder1",
"MainFolder/Folder2",
"MainFolder/Folder3",
"MainFolder/Folder1/Subfolder1",
"MainFolder/Folder1/Subfolder2"),
Value = c(1,1,5,2,4,10))
tree <- as.Node(data, Value)
print(tree, "Value")
levelName Value
1 MainFolder 1
2 ¦--Folder1 1
3 ¦ ¦--Subfolder1 4
4 ¦ °--Subfolder2 10
5 ¦--Folder2 5
6 °--Folder3 2
我目前解决问题的速度非常慢:
# Function to sum up file counts pr folder + subfolders
total_count <- function(node) {
results <- sum(as.data.frame(print(node, "Value"))$Value)
return(results)
}
# Summing up file counts pr folder + subfolders
tree$Do(function(node) node$Value_by_folder <- total_count(node))
# Results
print(tree, "Value", "Value_by_folder")
levelName Value Value_by_folder
1 MainFolder 1 23
2 ¦--Folder1 1 15
3 ¦ ¦--Subfolder1 4 4
4 ¦ °--Subfolder2 10 10
5 ¦--Folder2 5 5
6 °--Folder3 2 2
关于如何更有效地执行此操作,您有什么建议吗?我一直在尝试构建递归方法,并在节点上使用函数 "isLeaf" 和 "children",但无法使其工作。
你可以这样做:
get_value_by_folder <- function(tree) {
res <- rep(NA_real_, tree$totalCount)
i <- 0
myApply <- function(node) {
i <<- i + 1
force(k <- i)
res[k] <<- node$Value + `if`(node$isLeaf, 0, sum(sapply(node$children, myApply)))
}
myApply(tree)
res
}
force
很重要,因为 R 的惰性求值会打乱您要填充的顺序 res
。
你得到:
> get_value_by_folder(tree)
[1] 23 15 4 10 5 2
编辑:如果要直接在树中填充。
get_value_by_folder2 <- function(tree) {
myApply <- function(node) {
node$Value_by_folder <- node$Value + `if`(node$isLeaf, 0, sum(sapply(node$children, myApply)))
}
myApply(tree)
tree
}
> print(get_value_by_folder2(tree), "Value", "Value_by_folder")
levelName Value Value_by_folder
1 MainFolder 1 23
2 ¦--Folder1 1 15
3 ¦ ¦--Subfolder1 4 4
4 ¦ °--Subfolder2 10 10
5 ¦--Folder2 5 5
6 °--Folder3 2 2
注意class是环境所以修改了原来的tree
> print(tree, "Value", "Value_by_folder")
levelName Value Value_by_folder
1 MainFolder 1 23
2 ¦--Folder1 1 15
3 ¦ ¦--Subfolder1 4 4
4 ¦ °--Subfolder2 10 10
5 ¦--Folder2 5 5
6 °--Folder3 2 2
这是一种有效的方法。它使用 data.tree API 并将值存储在树中:
MyAggregate <- function(node) {
if (node$isLeaf) return (node$Value)
sum(Get(node$children, "Value_by_folder")) + node$Value
}
tree$Do(function(node) node$Value_by_folder <- MyAggregate(node), traversal = "post-order")