解析文件树对象中的许多路径。有没有高效的算法?
Parse many paths in the file tree object. Is there an efficient algorithm?
我的代码需要创建许多文件路径的文件树,如
dir1/file1
dir1/dir2/file2
dir1/dir2/file3
FileTree 对象可视化示例:
dir1
|_file1
|_dir2
|_file2
|_file3
此树用于以图形形式可视化 Torrent 内容文件。它还用于动态显示文件进度。
在少数子文件夹和文件中它有效地工作,但如果路径 > 10,000,则需要大量内存和时间(> 4 秒和 50 MB RAM)。
有没有一种有效的算法来制作这样的图?对我来说最关键的是图表制作速度。
算法实现的例子可以用任何语言写,对我来说没关系:-)
提前致谢。
我的 Java 用于此目的的代码:
FileTree root = new FileTree(FileTree.ROOT, File.Type.DIR);
FileTree parentTree;
for (String pathToFile : paths) {
parentTree = root;
String[] nodes = FileIOUtils.parsePath(pathToFile); /*String.split(File.separator)*/
for (int i = 0; i < nodes.length; i++) {
/* The last leaf item is a file */
if (i == (nodes.length - 1)) {
parentTree.addChild(new FileTree(nodes[i],
File.Type.FILE, parentTree));
} else {
parentTree.addChild(new FileTree(nodes[i], FileNode.Type.DIR, parentTree));
}
FileTree nextParent = parentTree.getChild(nodes[i]);
/* Skipping leaf nodes */
if (nextParent != null && !nextParent.isFile()) {
parentTree = nextParent;
}
}
}
文件树class:
public class FileTree {
public static final String ROOT = "/";
/* The name for pointer to the parent node */
public static final String PARENT_DIR = "..";
protected String name;
protected boolean isLeaf;
protected FileTree parent;
protected Map<String, FileTree> children = new LinkedHashMap<>();
public FileTree(String name, int type, FileTree parent) {
this(name, type, parent);
}
public FileTree(String name, int type)
{
this(name, type, null);
}
public FileTree(String name, int type, FileTree parent)
{
this.name = name;
isLeaf = (type == File.Type.FILE);
this.parent = parent;
}
public synchronized void addChild(FileTree node)
{
if (!children.containsKey(node.getName())) {
children.put(node.getName(), node);
}
}
public boolean contains(String name)
{
return children.containsKey(name);
}
public F getChild(String name)
{
return children.get(name);
}
public Collection<FileTree> getChildren()
{
return children.values();
}
public Set<String> getChildrenName()
{
return children.keySet();
}
}
编辑:
创建1000个子文件夹的树的速度可以达到平均0.5-1秒(早30秒)。
FileTree root = new BencodeFileTree(FileTree.ROOT, 0L, File.Type.DIR);
FileTree parentTree = root;
/* It allows reduce the number of iterations on the paths with equal beginnings */
String prevPath = "";
/* Sort reduces the returns number to root */
Collections.sort(files);
for (String file : files) {
String path;
/*
* Compare previous path with new path.
* Example:
* prev = dir1/dir2/
* cur = dir1/dir2/file1
* |________|
* equal
*
* prev = dir1/dir2/
* cur = dir3/file2
* |________|
* not equal
*/
if (!prevPath.isEmpty() &&
file.regionMatches(true, 0, prevPath, 0, prevPath.length())) {
/*
* Beginning paths are equal, remove previous path from the new path.
* Example:
* prev = dir1/dir2/
* cur = dir1/dir2/file1
* new = file1
*/
path = file.substring(prevPath.length());
} else {
/* Beginning paths are not equal, return to root */
path = file;
parentTree = root;
}
String[] nodes = FileIOUtils.parsePath(path);
/*
* Remove last node (file) from previous path.
* Example:
* cur = dir1/dir2/file1
* new = dir1/dir2/
*/
prevPath = file.substring(0, file.length() - nodes[nodes.length - 1].length());
/* Iterates path nodes */
for (int i = 0; i < nodes.length; i++) {
if (!parentTree.contains(nodes[i])) {
/* The last leaf item is a file */
parentTree.addChild(makeObject(nodes[i], parentTree,
i == (nodes.length - 1)));
}
FileTree nextParent = parentTree.getChild(nodes[i]);
/* Skipping leaf nodes */
if (!nextParent.isFile()) {
parentTree = nextParent;
}
}
}
基本算法对我来说看起来不错,但是当你调用 addChild
时你正在创建很多不必要的 FileTree
对象,这些对象在(常见)情况下会立即被丢弃它们已经存在.您可以尝试将参数传递给构造函数,并仅在需要插入对象时才构造对象:
public synchronized void addChild(String name, int type, FileTree parent)
{
if (!children.containsKey(name)) {
children.put(name, new FileTree(name, type, parent));
}
}
和:
if (i == (nodes.length - 1)) {
parentTree.addChild(nodes[i], File.Type.FILE, parentTree);
} else {
parentTree.addChild(nodes[i], FileNode.Type.DIR, parentTree);
}
可能不需要传入parentTree
:直接用this
构造即可。
另一个优化可能是维护来自您处理的先前路径的 String 对象数组(和关联的 FileTree 节点),并在添加子项之前扫描直到找到与前一个不同的条目。
我建议用 HashMap
替换 LinkedHashMap
因为第一个消耗更多内存。主要区别在于 HashMap
不保证条目的迭代顺序。但是您可以在 GUI 中订购 children(可能您有一些订购设置)。查看 this question 以获取一些参考。
另一个建议是 return 来自方法 addChild
的实际 child 节点
public synchronized FileTree addChild(FileTree node) {
return children.putIfAbsent(node.getName(), node);
}
然后内部循环就不需要再在地图上调用get
FileTree nextParent = parentTree.addChild(...
并且有看起来不必要的条件
if (nextParent != null && !nextParent.isFile()) {
parentTree = nextParent;
}
如果当前 child 是一个文件,循环中似乎不会有迭代。所以它可以安全地替换为
parentTree = parentTree.addChild(...
建议后循环体看起来像
for(...) {
int type = if (i == (nodes.length - 1)) ? File.Type.FILE : FileNode.Type.DIR;
parentTree = parentTree.addChild(new FileTree(nodes[i], type, parentTree);
}
我的代码需要创建许多文件路径的文件树,如
dir1/file1
dir1/dir2/file2
dir1/dir2/file3
FileTree 对象可视化示例:
dir1
|_file1
|_dir2
|_file2
|_file3
此树用于以图形形式可视化 Torrent 内容文件。它还用于动态显示文件进度。 在少数子文件夹和文件中它有效地工作,但如果路径 > 10,000,则需要大量内存和时间(> 4 秒和 50 MB RAM)。
有没有一种有效的算法来制作这样的图?对我来说最关键的是图表制作速度。 算法实现的例子可以用任何语言写,对我来说没关系:-) 提前致谢。
我的 Java 用于此目的的代码:
FileTree root = new FileTree(FileTree.ROOT, File.Type.DIR);
FileTree parentTree;
for (String pathToFile : paths) {
parentTree = root;
String[] nodes = FileIOUtils.parsePath(pathToFile); /*String.split(File.separator)*/
for (int i = 0; i < nodes.length; i++) {
/* The last leaf item is a file */
if (i == (nodes.length - 1)) {
parentTree.addChild(new FileTree(nodes[i],
File.Type.FILE, parentTree));
} else {
parentTree.addChild(new FileTree(nodes[i], FileNode.Type.DIR, parentTree));
}
FileTree nextParent = parentTree.getChild(nodes[i]);
/* Skipping leaf nodes */
if (nextParent != null && !nextParent.isFile()) {
parentTree = nextParent;
}
}
}
文件树class:
public class FileTree {
public static final String ROOT = "/";
/* The name for pointer to the parent node */
public static final String PARENT_DIR = "..";
protected String name;
protected boolean isLeaf;
protected FileTree parent;
protected Map<String, FileTree> children = new LinkedHashMap<>();
public FileTree(String name, int type, FileTree parent) {
this(name, type, parent);
}
public FileTree(String name, int type)
{
this(name, type, null);
}
public FileTree(String name, int type, FileTree parent)
{
this.name = name;
isLeaf = (type == File.Type.FILE);
this.parent = parent;
}
public synchronized void addChild(FileTree node)
{
if (!children.containsKey(node.getName())) {
children.put(node.getName(), node);
}
}
public boolean contains(String name)
{
return children.containsKey(name);
}
public F getChild(String name)
{
return children.get(name);
}
public Collection<FileTree> getChildren()
{
return children.values();
}
public Set<String> getChildrenName()
{
return children.keySet();
}
}
编辑:
创建1000个子文件夹的树的速度可以达到平均0.5-1秒(早30秒)。
FileTree root = new BencodeFileTree(FileTree.ROOT, 0L, File.Type.DIR);
FileTree parentTree = root;
/* It allows reduce the number of iterations on the paths with equal beginnings */
String prevPath = "";
/* Sort reduces the returns number to root */
Collections.sort(files);
for (String file : files) {
String path;
/*
* Compare previous path with new path.
* Example:
* prev = dir1/dir2/
* cur = dir1/dir2/file1
* |________|
* equal
*
* prev = dir1/dir2/
* cur = dir3/file2
* |________|
* not equal
*/
if (!prevPath.isEmpty() &&
file.regionMatches(true, 0, prevPath, 0, prevPath.length())) {
/*
* Beginning paths are equal, remove previous path from the new path.
* Example:
* prev = dir1/dir2/
* cur = dir1/dir2/file1
* new = file1
*/
path = file.substring(prevPath.length());
} else {
/* Beginning paths are not equal, return to root */
path = file;
parentTree = root;
}
String[] nodes = FileIOUtils.parsePath(path);
/*
* Remove last node (file) from previous path.
* Example:
* cur = dir1/dir2/file1
* new = dir1/dir2/
*/
prevPath = file.substring(0, file.length() - nodes[nodes.length - 1].length());
/* Iterates path nodes */
for (int i = 0; i < nodes.length; i++) {
if (!parentTree.contains(nodes[i])) {
/* The last leaf item is a file */
parentTree.addChild(makeObject(nodes[i], parentTree,
i == (nodes.length - 1)));
}
FileTree nextParent = parentTree.getChild(nodes[i]);
/* Skipping leaf nodes */
if (!nextParent.isFile()) {
parentTree = nextParent;
}
}
}
基本算法对我来说看起来不错,但是当你调用 addChild
时你正在创建很多不必要的 FileTree
对象,这些对象在(常见)情况下会立即被丢弃它们已经存在.您可以尝试将参数传递给构造函数,并仅在需要插入对象时才构造对象:
public synchronized void addChild(String name, int type, FileTree parent)
{
if (!children.containsKey(name)) {
children.put(name, new FileTree(name, type, parent));
}
}
和:
if (i == (nodes.length - 1)) {
parentTree.addChild(nodes[i], File.Type.FILE, parentTree);
} else {
parentTree.addChild(nodes[i], FileNode.Type.DIR, parentTree);
}
可能不需要传入parentTree
:直接用this
构造即可。
另一个优化可能是维护来自您处理的先前路径的 String 对象数组(和关联的 FileTree 节点),并在添加子项之前扫描直到找到与前一个不同的条目。
我建议用 HashMap
替换 LinkedHashMap
因为第一个消耗更多内存。主要区别在于 HashMap
不保证条目的迭代顺序。但是您可以在 GUI 中订购 children(可能您有一些订购设置)。查看 this question 以获取一些参考。
另一个建议是 return 来自方法 addChild
public synchronized FileTree addChild(FileTree node) {
return children.putIfAbsent(node.getName(), node);
}
然后内部循环就不需要再在地图上调用get
FileTree nextParent = parentTree.addChild(...
并且有看起来不必要的条件
if (nextParent != null && !nextParent.isFile()) {
parentTree = nextParent;
}
如果当前 child 是一个文件,循环中似乎不会有迭代。所以它可以安全地替换为
parentTree = parentTree.addChild(...
建议后循环体看起来像
for(...) {
int type = if (i == (nodes.length - 1)) ? File.Type.FILE : FileNode.Type.DIR;
parentTree = parentTree.addChild(new FileTree(nodes[i], type, parentTree);
}