解析文件树对象中的许多路径。有没有高效的算法?

Parse many paths in the file tree object. Is there an efficient algorithm?

我的代码需要创建许多文件路径的文件树,如

dir1/file1
dir1/dir2/file2
dir1/dir2/file3

FileTree 对象可视化示例:

dir1
|_file1
|_dir2
  |_file2
  |_file3

此树用于以图形形式可视化 Torrent 内容文件。它还用于动态显示文件进度。 在少数子文件夹和文件中它有效地工作,但如果路径 > 10,000,则需要大量内存和时间(> 4 秒和 50 MB RAM)。

有没有一种有效的算法来制作这样的图?对我来说最关键的是图表制作速度。 算法实现的例子可以用任何语言写,对我来说没关系:-) 提前致谢。

我的 Java 用于此目的的代码:

FileTree root = new FileTree(FileTree.ROOT, File.Type.DIR);
FileTree parentTree;

for (String pathToFile : paths) {
    parentTree = root;
    String[] nodes = FileIOUtils.parsePath(pathToFile); /*String.split(File.separator)*/

    for (int i = 0; i < nodes.length; i++) {
            /* The last leaf item is a file */
        if (i == (nodes.length - 1)) {
            parentTree.addChild(new FileTree(nodes[i],
                File.Type.FILE, parentTree));
        } else {
            parentTree.addChild(new FileTree(nodes[i], FileNode.Type.DIR, parentTree));
        }

        FileTree nextParent = parentTree.getChild(nodes[i]);
            /* Skipping leaf nodes */
        if (nextParent != null && !nextParent.isFile()) {
            parentTree = nextParent;
        }
    }
}

文件树class:

public class FileTree {
    public static final String ROOT = "/";
    /* The name for pointer to the parent node */
    public static final String PARENT_DIR = "..";

    protected String name;
    protected boolean isLeaf;
    protected FileTree parent;
    protected Map<String, FileTree> children = new LinkedHashMap<>();

    public FileTree(String name, int type, FileTree parent) {
        this(name, type, parent);
    }

    public FileTree(String name, int type)
    {
        this(name, type, null);
    }

    public FileTree(String name, int type, FileTree parent)
    {
        this.name = name;
        isLeaf = (type == File.Type.FILE);
        this.parent = parent;
    }

    public synchronized void addChild(FileTree node)
    {
        if (!children.containsKey(node.getName())) {
            children.put(node.getName(), node);
        }
    }

    public boolean contains(String name)
    {
        return children.containsKey(name);
    }

    public F getChild(String name)
    {
        return children.get(name);
    }

    public Collection<FileTree> getChildren()
    {
        return children.values();
    }

    public Set<String> getChildrenName()
    {
        return children.keySet();
    }
}

编辑:

创建1000个子文件夹的树的速度可以达到平均0.5-1秒(早30秒)。

    FileTree root = new BencodeFileTree(FileTree.ROOT, 0L, File.Type.DIR);
    FileTree parentTree = root;
    /* It allows reduce the number of iterations on the paths with equal beginnings */
    String prevPath = "";
    /* Sort reduces the returns number to root */
    Collections.sort(files);

    for (String file : files) {
        String path;
        /*
         * Compare previous path with new path.
         * Example:
         * prev = dir1/dir2/
         * cur  = dir1/dir2/file1
         *        |________|
         *          equal
         *
         * prev = dir1/dir2/
         * cur  = dir3/file2
         *        |________|
         *         not equal
         */
        if (!prevPath.isEmpty() &&
                file.regionMatches(true, 0, prevPath, 0, prevPath.length())) {
            /*
             * Beginning paths are equal, remove previous path from the new path.
             * Example:
             * prev = dir1/dir2/
             * cur  = dir1/dir2/file1
             * new  = file1
             */
            path = file.substring(prevPath.length());
        } else {
            /* Beginning paths are not equal, return to root */
            path = file;
            parentTree = root;
        }

        String[] nodes = FileIOUtils.parsePath(path);
        /*
         * Remove last node (file) from previous path.
         * Example:
         * cur = dir1/dir2/file1
         * new = dir1/dir2/
         */
        prevPath = file.substring(0, file.length() - nodes[nodes.length - 1].length());

        /* Iterates path nodes */
        for (int i = 0; i < nodes.length; i++) {
            if (!parentTree.contains(nodes[i])) {
                /* The last leaf item is a file */
                parentTree.addChild(makeObject(nodes[i], parentTree,
                                i == (nodes.length - 1)));
            }

            FileTree nextParent = parentTree.getChild(nodes[i]);
            /* Skipping leaf nodes */
            if (!nextParent.isFile()) {
                parentTree = nextParent;
            }
        }
    }

基本算法对我来说看起来不错,但是当你调用 addChild 时你正在创建很多不必要的 FileTree 对象,这些对象在(常见)情况下会立即被丢弃它们已经存在.您可以尝试将参数传递给构造函数,并仅在需要插入对象时才构造对象:

public synchronized void addChild(String name, int type, FileTree parent)
{
    if (!children.containsKey(name)) {
        children.put(name, new FileTree(name, type, parent));
    }
}

和:

if (i == (nodes.length - 1)) {
     parentTree.addChild(nodes[i], File.Type.FILE, parentTree);
} else {
     parentTree.addChild(nodes[i], FileNode.Type.DIR, parentTree);
}

可能不需要传入parentTree:直接用this构造即可。

另一个优化可能是维护来自您处理的先前路径的 String 对象数组(和关联的 FileTree 节点),并在添加子项之前扫描直到找到与前一个不同的条目。

我建议用 HashMap 替换 LinkedHashMap 因为第一个消耗更多内存。主要区别在于 HashMap 不保证条目的迭代顺序。但是您可以在 GUI 中订购 children(可能您有一些订购设置)。查看 this question 以获取一些参考。


另一个建议是 return 来自方法 addChild

的实际 child 节点
public synchronized FileTree addChild(FileTree node) {
    return children.putIfAbsent(node.getName(), node);
}

然后内部循环就不需要再在地图上调用get

FileTree nextParent = parentTree.addChild(...

并且有看起来不必要的条件

if (nextParent != null && !nextParent.isFile()) {
    parentTree = nextParent;
}

如果当前 child 是一个文件,循环中似乎不会有迭代。所以它可以安全地替换为

parentTree = parentTree.addChild(...

建议后循环体看起来像

for(...) {
    int type = if (i == (nodes.length - 1)) ? File.Type.FILE : FileNode.Type.DIR;
    parentTree = parentTree.addChild(new FileTree(nodes[i], type, parentTree);
}