计算复杂文件夹结构中每个文件夹有多少个文件夹?
Computing how many folders each folder has in a complex folder structure?
考虑以下 tree
:
library(data.tree)
acme <- Node$new("Acme Inc.")
accounting <- acme$AddChild("Accounting")
software <- accounting$AddChild("New Software")
standards <- accounting$AddChild("New Accounting Standards")
research <- acme$AddChild("Research")
newProductLine <- research$AddChild("New Product Line")
newLabs <- research$AddChild("New Labs")
it <- acme$AddChild("IT")
outsource <- it$AddChild("Outsource")
agile <- it$AddChild("Go agile")
goToR <- it$AddChild("Switch to R")
然后我想计算 averageBranchingFactor
:
averageBranchingFactor(acme)
这会产生2.5
但是,出于各种原因,我希望能够获得所有分支因子,而不仅仅是平均分支因子。例如,我需要它来统计比较两个文件结构在平均分支因子之间的显着差异。
根据 manual for data.tree
,AverageBranchingFactor()
函数执行以下操作:"calculate the average number of branches each non-leaf has." 因此,我首先尝试了以下操作:
acme.df <- ToDataFrameTree(acme, "averageBranchingFactor")
mean(acme.df$averageBranchingFactor[acme.df$averageBranchingFactor>0])
这会产生 2.375
,然后我会尝试一个更简单的版本:
mean(acme.df$averageBranchingFactor)
这会产生 0.8636364
我如何得出所有单个分支因子的平均值为 2.5
?
理想情况下,我想创建一个 data.frame
来列出每个文件夹,并带有一个变量,其中列出了每个文件夹的分支因子。例如,我有这个非常简单的文件夹结构:
top_level_folder
sub_folder_1
sub_folder_2
sub_folder_3
回答这个问题将涉及创建如下所示的输出:
Folders Subfolders (BranchingFactor)
top_level_folder 2
sub_folder_1 0
sub_folder_2 1
sub_folder_3 0
第一列可以通过调用list.dirs("/Users/username/Downloads/top_level/")
简单地生成,但我不知道如何生成第二列。请注意,第二列是非递归的,这意味着不计算子文件夹中的文件夹(即 top_level_folder
仅包含 2 个子文件夹,即使 sub_folder_2
包含另一个文件夹 sub_folder_2
)。
如果您想查看您的解决方案是否可扩展,请下载 Rails 代码库:https://github.com/rails/rails/archive/master.zip 并在 Rails 更复杂的文件结构上尝试。
您可以简单地沿着文件夹结构循环并计算每个级别的文件夹数量(没有递归):
dir.create("top_level_folder/sub_folder_2/sub_folder_3", recursive = TRUE)
dir.create("top_level_folder/sub_folder_1")
dirs <- list.dirs()
branching_factor <- vector(length = length(dirs))
for (i in 1:length(dirs)) {
branching_factor[i] <- length(list.dirs(path = dirs[i],
full.names = FALSE, recursive = FALSE))
}
result <- data.frame(Folders = basename(dirs), BranchingFactor = branching_factor)
result[-1,]
您还可以使用此代码的更短、更简洁和矢量化的版本:
dirs <- list.dirs()
branching_factor <- sapply(dirs, function(x) length(list.dirs(x, FALSE, FALSE)))
result2 <- data.frame(Folders = basename(dirs), BranchingFactor = branching_factor,
row.names = NULL)[-1,]
结果看起来像这样:
> head(result2[rev(order(result2[,2])),])
Folders BranchingFactor
208 fixtures 24
122 fixtures 23
42 fixtures 18
440 core_ext 17
340 active_record 17
562 rails 16
我递归地获取所有文件夹的列表,然后制作 table 文件夹子文件夹对,从中我可以按文件夹计算子文件夹的数量。
虽然我错过了空文件夹,所以我用左连接将它与初始文件夹重新合并,并用零填充 NA。
path <- getwd()
all_folders <- path %>% list.dirs(full.names=TRUE,recursive=TRUE) %>%
data.frame(stringsAsFactors=FALSE) %>% setNames("Folders")
all_sub_folders <- all_folders$Folders %>%
strsplit("/") %>%
lapply(function(x){c(x[length(x)-1],x[length(x)])}) %>%
do.call(rbind,.) %>%
as.data.frame(stringsAsFactors=FALSE) %>%
setNames(c("ParentFolders","Folders"))
output <- all_sub_folders$ParentFolders %>% table %>% as.data.frame(stringsAsFactors=FALSE) %>% setNames(c("Folders","SubFolders")))
output <- merge(all_sub_folders,output,all.x = TRUE)[,c("Folders","SubFolders")]
output$SubFolders[is.na(output$SubFolders)] <- 0
output <- output[match(all_sub_folders$Folders,output$Folders),]
head(output)
# Folders SubFolders
# 2160 Rhome 126
# 17 acepack 5
# 856 help 1
# 992 html 9
# 1486 libs 124
# 1130 i386 0
只是纠正@Gilles 的解决方案,
path <- "SO/rails-master/"
dirs <- list.dirs(path)
branching_factor <- vector(length = length(dirs))
for (i in 1:length(dirs)) {
branching_factor[i] <- length(list.dirs(path = dirs[i], recursive = FALSE))
}
result <- data.frame(Folders = basename(dirs), BranchingFactor = branching_factor)
> head(result)
Folders BranchingFactor
1 rails-master 14
2 .github 0
3 actioncable 4
4 app 1
5 assets 1
6 javascripts 1
希望对您有所帮助。
您可以调整 on ,用 recursive = FALSE
代替 list.dirs
代替 list.files
:
library(purrr)
files <- .libPaths()[1] %>% # omit for current directory or supply alternate path
list.dirs() %>%
map_df(~list(path = .x,
dirs = length(list.dirs(.x, recursive = FALSE))))
files
#> # A tibble: 4,457 x 2
#> path dirs
#> <chr> <int>
#> 1 /Library/Frameworks/R.framework/Versions/3.4/Resources/library 314
#> 2 /Library/Frameworks/R.framework/Versions/3.4/Resources/library/abind 4
#> 3 /Library/Frameworks/R.framework/Versions/3.4/Resources/library/abind/help 0
#> 4 /Library/Frameworks/R.framework/Versions/3.4/Resources/library/abind/html 0
#> 5 /Library/Frameworks/R.framework/Versions/3.4/Resources/library/abind/Meta 0
#> 6 /Library/Frameworks/R.framework/Versions/3.4/Resources/library/abind/R 0
#> 7 /Library/Frameworks/R.framework/Versions/3.4/Resources/library/acepack 5
#> 8 /Library/Frameworks/R.framework/Versions/3.4/Resources/library/acepack/help 0
#> 9 /Library/Frameworks/R.framework/Versions/3.4/Resources/library/acepack/html 0
#> 10 /Library/Frameworks/R.framework/Versions/3.4/Resources/library/acepack/libs 1
#> # ... with 4,447 more rows
mean(files$dirs[files$dirs != 0])
#> [1] 2.952949
或以 R 为基数,
files <- do.call(rbind, lapply(list.dirs(.libPaths()[1]), function(path){
data.frame(path = path,
dirs = length(list.dirs(path, recursive = FALSE)),
stringsAsFactors = FALSE)
}))
head(files)
#> path dirs
#> 1 /Library/Frameworks/R.framework/Versions/3.4/Resources/library 314
#> 2 /Library/Frameworks/R.framework/Versions/3.4/Resources/library/abind 4
#> 3 /Library/Frameworks/R.framework/Versions/3.4/Resources/library/abind/help 0
#> 4 /Library/Frameworks/R.framework/Versions/3.4/Resources/library/abind/html 0
#> 5 /Library/Frameworks/R.framework/Versions/3.4/Resources/library/abind/Meta 0
#> 6 /Library/Frameworks/R.framework/Versions/3.4/Resources/library/abind/R 0
mean(files$dirs[files$dirs != 0])
#> [1] 2.952949
averageBranchingFactor 不包括叶子。
旁注:您可以直接使用 data(acme)
.
获得 acme
library(data.tree)
data(acme)
acme$averageBranchingFactor
acme$count
print(acme, abf = "averageBranchingFactor", "count")
这将显示为:
levelName abf count
1 Acme Inc. 2.5 3
2 ¦--Accounting 2.0 2
3 ¦ ¦--New Software 0.0 0
4 ¦ °--New Accounting Standards 0.0 0
5 ¦--Research 2.0 2
6 ¦ ¦--New Product Line 0.0 0
7 ¦ °--New Labs 0.0 0
8 °--IT 3.0 3
9 ¦--Outsource 0.0 0
10 ¦--Go agile 0.0 0
11 °--Switch to R 0.0 0
?averageBranchingFactor
的实现没有任何秘密,因此您可以根据需要对其进行调整。只需在您的控制台中输入 averageBranchingFactor
(不带括号):
function (node)
{
t <- Traverse(node, filterFun = isNotLeaf)
if (length(t) == 0)
return(0)
cnt <- Get(t, "count")
if (!is.numeric(cnt))
browser()
return(mean(cnt))
}
简而言之,我们遍历树(叶子除外),并得到每个节点的count
值。最后,我们计算平均值。
希望对您有所帮助。
考虑以下 tree
:
library(data.tree)
acme <- Node$new("Acme Inc.")
accounting <- acme$AddChild("Accounting")
software <- accounting$AddChild("New Software")
standards <- accounting$AddChild("New Accounting Standards")
research <- acme$AddChild("Research")
newProductLine <- research$AddChild("New Product Line")
newLabs <- research$AddChild("New Labs")
it <- acme$AddChild("IT")
outsource <- it$AddChild("Outsource")
agile <- it$AddChild("Go agile")
goToR <- it$AddChild("Switch to R")
然后我想计算 averageBranchingFactor
:
averageBranchingFactor(acme)
这会产生2.5
但是,出于各种原因,我希望能够获得所有分支因子,而不仅仅是平均分支因子。例如,我需要它来统计比较两个文件结构在平均分支因子之间的显着差异。
根据 manual for data.tree
,AverageBranchingFactor()
函数执行以下操作:"calculate the average number of branches each non-leaf has." 因此,我首先尝试了以下操作:
acme.df <- ToDataFrameTree(acme, "averageBranchingFactor")
mean(acme.df$averageBranchingFactor[acme.df$averageBranchingFactor>0])
这会产生 2.375
,然后我会尝试一个更简单的版本:
mean(acme.df$averageBranchingFactor)
这会产生 0.8636364
我如何得出所有单个分支因子的平均值为 2.5
?
理想情况下,我想创建一个 data.frame
来列出每个文件夹,并带有一个变量,其中列出了每个文件夹的分支因子。例如,我有这个非常简单的文件夹结构:
top_level_folder
sub_folder_1
sub_folder_2
sub_folder_3
回答这个问题将涉及创建如下所示的输出:
Folders Subfolders (BranchingFactor)
top_level_folder 2
sub_folder_1 0
sub_folder_2 1
sub_folder_3 0
第一列可以通过调用list.dirs("/Users/username/Downloads/top_level/")
简单地生成,但我不知道如何生成第二列。请注意,第二列是非递归的,这意味着不计算子文件夹中的文件夹(即 top_level_folder
仅包含 2 个子文件夹,即使 sub_folder_2
包含另一个文件夹 sub_folder_2
)。
如果您想查看您的解决方案是否可扩展,请下载 Rails 代码库:https://github.com/rails/rails/archive/master.zip 并在 Rails 更复杂的文件结构上尝试。
您可以简单地沿着文件夹结构循环并计算每个级别的文件夹数量(没有递归):
dir.create("top_level_folder/sub_folder_2/sub_folder_3", recursive = TRUE)
dir.create("top_level_folder/sub_folder_1")
dirs <- list.dirs()
branching_factor <- vector(length = length(dirs))
for (i in 1:length(dirs)) {
branching_factor[i] <- length(list.dirs(path = dirs[i],
full.names = FALSE, recursive = FALSE))
}
result <- data.frame(Folders = basename(dirs), BranchingFactor = branching_factor)
result[-1,]
您还可以使用此代码的更短、更简洁和矢量化的版本:
dirs <- list.dirs()
branching_factor <- sapply(dirs, function(x) length(list.dirs(x, FALSE, FALSE)))
result2 <- data.frame(Folders = basename(dirs), BranchingFactor = branching_factor,
row.names = NULL)[-1,]
结果看起来像这样:
> head(result2[rev(order(result2[,2])),])
Folders BranchingFactor
208 fixtures 24
122 fixtures 23
42 fixtures 18
440 core_ext 17
340 active_record 17
562 rails 16
我递归地获取所有文件夹的列表,然后制作 table 文件夹子文件夹对,从中我可以按文件夹计算子文件夹的数量。
虽然我错过了空文件夹,所以我用左连接将它与初始文件夹重新合并,并用零填充 NA。
path <- getwd()
all_folders <- path %>% list.dirs(full.names=TRUE,recursive=TRUE) %>%
data.frame(stringsAsFactors=FALSE) %>% setNames("Folders")
all_sub_folders <- all_folders$Folders %>%
strsplit("/") %>%
lapply(function(x){c(x[length(x)-1],x[length(x)])}) %>%
do.call(rbind,.) %>%
as.data.frame(stringsAsFactors=FALSE) %>%
setNames(c("ParentFolders","Folders"))
output <- all_sub_folders$ParentFolders %>% table %>% as.data.frame(stringsAsFactors=FALSE) %>% setNames(c("Folders","SubFolders")))
output <- merge(all_sub_folders,output,all.x = TRUE)[,c("Folders","SubFolders")]
output$SubFolders[is.na(output$SubFolders)] <- 0
output <- output[match(all_sub_folders$Folders,output$Folders),]
head(output)
# Folders SubFolders
# 2160 Rhome 126
# 17 acepack 5
# 856 help 1
# 992 html 9
# 1486 libs 124
# 1130 i386 0
只是纠正@Gilles 的解决方案,
path <- "SO/rails-master/"
dirs <- list.dirs(path)
branching_factor <- vector(length = length(dirs))
for (i in 1:length(dirs)) {
branching_factor[i] <- length(list.dirs(path = dirs[i], recursive = FALSE))
}
result <- data.frame(Folders = basename(dirs), BranchingFactor = branching_factor)
> head(result)
Folders BranchingFactor
1 rails-master 14
2 .github 0
3 actioncable 4
4 app 1
5 assets 1
6 javascripts 1
希望对您有所帮助。
您可以调整 recursive = FALSE
代替 list.dirs
代替 list.files
:
library(purrr)
files <- .libPaths()[1] %>% # omit for current directory or supply alternate path
list.dirs() %>%
map_df(~list(path = .x,
dirs = length(list.dirs(.x, recursive = FALSE))))
files
#> # A tibble: 4,457 x 2
#> path dirs
#> <chr> <int>
#> 1 /Library/Frameworks/R.framework/Versions/3.4/Resources/library 314
#> 2 /Library/Frameworks/R.framework/Versions/3.4/Resources/library/abind 4
#> 3 /Library/Frameworks/R.framework/Versions/3.4/Resources/library/abind/help 0
#> 4 /Library/Frameworks/R.framework/Versions/3.4/Resources/library/abind/html 0
#> 5 /Library/Frameworks/R.framework/Versions/3.4/Resources/library/abind/Meta 0
#> 6 /Library/Frameworks/R.framework/Versions/3.4/Resources/library/abind/R 0
#> 7 /Library/Frameworks/R.framework/Versions/3.4/Resources/library/acepack 5
#> 8 /Library/Frameworks/R.framework/Versions/3.4/Resources/library/acepack/help 0
#> 9 /Library/Frameworks/R.framework/Versions/3.4/Resources/library/acepack/html 0
#> 10 /Library/Frameworks/R.framework/Versions/3.4/Resources/library/acepack/libs 1
#> # ... with 4,447 more rows
mean(files$dirs[files$dirs != 0])
#> [1] 2.952949
或以 R 为基数,
files <- do.call(rbind, lapply(list.dirs(.libPaths()[1]), function(path){
data.frame(path = path,
dirs = length(list.dirs(path, recursive = FALSE)),
stringsAsFactors = FALSE)
}))
head(files)
#> path dirs
#> 1 /Library/Frameworks/R.framework/Versions/3.4/Resources/library 314
#> 2 /Library/Frameworks/R.framework/Versions/3.4/Resources/library/abind 4
#> 3 /Library/Frameworks/R.framework/Versions/3.4/Resources/library/abind/help 0
#> 4 /Library/Frameworks/R.framework/Versions/3.4/Resources/library/abind/html 0
#> 5 /Library/Frameworks/R.framework/Versions/3.4/Resources/library/abind/Meta 0
#> 6 /Library/Frameworks/R.framework/Versions/3.4/Resources/library/abind/R 0
mean(files$dirs[files$dirs != 0])
#> [1] 2.952949
averageBranchingFactor 不包括叶子。
旁注:您可以直接使用 data(acme)
.
library(data.tree)
data(acme)
acme$averageBranchingFactor
acme$count
print(acme, abf = "averageBranchingFactor", "count")
这将显示为:
levelName abf count
1 Acme Inc. 2.5 3
2 ¦--Accounting 2.0 2
3 ¦ ¦--New Software 0.0 0
4 ¦ °--New Accounting Standards 0.0 0
5 ¦--Research 2.0 2
6 ¦ ¦--New Product Line 0.0 0
7 ¦ °--New Labs 0.0 0
8 °--IT 3.0 3
9 ¦--Outsource 0.0 0
10 ¦--Go agile 0.0 0
11 °--Switch to R 0.0 0
?averageBranchingFactor
的实现没有任何秘密,因此您可以根据需要对其进行调整。只需在您的控制台中输入 averageBranchingFactor
(不带括号):
function (node)
{
t <- Traverse(node, filterFun = isNotLeaf)
if (length(t) == 0)
return(0)
cnt <- Get(t, "count")
if (!is.numeric(cnt))
browser()
return(mean(cnt))
}
简而言之,我们遍历树(叶子除外),并得到每个节点的count
值。最后,我们计算平均值。
希望对您有所帮助。