python - 解析 Maven 依赖树

python - parse maven dependency tree

我希望能够将 Maven 依赖关系树作为输入并对其进行解析以确定每个依赖关系的 groupId、artifactId 和版本及其 child(ren)(如果有),以及 child(ren) 的 groupId、artifactId 和版本(以及任何其他 child(ren) 等)。 我不确定在为 neo4j 准备数据之前解析 mvn 依赖树并将信息存储为嵌套字典是否最有意义。

我也不确定解析整个 mvn 依赖关系树的最佳方法。下面的代码是我在尝试解析、删除前面不必要的信息并将某些内容标记为 child 或 parent.

方面取得的最大进展
tree= 
[INFO] +- org.antlr:antlr4:jar:4.7.1:compile
[INFO] |  +- org.antlr:antlr4-runtime:jar:4.7.1:compile
[INFO] |  +- org.antlr:antlr-runtime:jar:3.5.2:compile
[INFO] |  \- com.ibm.icu:icu4j:jar:58.2:compile
[INFO] +- commons-io:commons-io:jar:1.3.2:compile
[INFO] +- brs:dxprog-lang:jar:3.3-SNAPSHOT:compile
[INFO] |  +- brs:libutil:jar:2.51:compile
[INFO] |  |  +- commons-collections:commons-collections:jar:3.2.2:compile
[INFO] |  |  +- org.apache.commons:commons-collections4:jar:4.1:compile
[INFO] |  |  |  +- com.fasterxml.jackson.core:jackson-annotations:jar:2.9.0:compile
    [INFO] |  |  |  \- com.fasterxml.jackson.core:jackson-core:jar:2.9.5:compile
.
.
.


fileObj = open("tree", "r")

for line in fileObj.readlines():
    for word in line.split():
        if "[INFO]" in line.split():
            line = line.replace(line.split().__getitem__(0), "")
            print(line)

            if "|" in line.split():
                line = line.replace(line.split().__getitem__(0), "child")
                print(line)

                if "+-" in line.split() and "|" not in line.split():
                    line = line.replace(line.split().__getitem__(0), "")
                    line = line.replace(line.split().__getitem__(0), "parent")
                    print(line, '\n\n')

输出:

 |  |  \- com.google.protobuf:protobuf-java:jar:3.5.1:compile

 child  child  \- com.google.protobuf:protobuf-java:jar:3.5.1:compile

 |  +- com.h2database:h2:jar:1.4.195:compile

 child  +- com.h2database:h2:jar:1.4.195:compile

   parent com.h2database:h2:jar:1.4.195:compile

鉴于我对 python 的功能相对不熟悉,如果您能以有组织的方式分析和 return 数据的最佳方式,我将不胜感激。提前致谢!

我不知道您的编程经验如何,但这不是一项微不足道的任务。

首先,您可以看到依赖的层叠级别由符号|具体化。您可以做的最简单的事情是构建一个堆栈,用于存储从根到 children、grandchildren、...:[=​​17=] 的依赖路径

def build_stack(text):
    stack = []
    for line in text.split("\n"):
        if not line:
            continue

        line = line[7:] # remove [INFO]
        level = line.count("|")
        name = line.split("-", 1)[1].strip() # the part after the -
        stack = stack[:level] + [name] # update the stack: everything up to level-1 and name
        yield stack[:level], name # this is a generator

for bottom_stack, name in build_stack(DATA):
    print (bottom_stack + [name])

输出:

['org.antlr:antlr4:jar:4.7.1:compile']
['org.antlr:antlr4:jar:4.7.1:compile', 'org.antlr:antlr4-runtime:jar:4.7.1:compile']
['org.antlr:antlr4:jar:4.7.1:compile', 'org.antlr:antlr-runtime:jar:3.5.2:compile']
['org.antlr:antlr4:jar:4.7.1:compile', 'com.ibm.icu:icu4j:jar:58.2:compile']
['commons-io:commons-io:jar:1.3.2:compile']
['brs:dxprog-lang:jar:3.3-SNAPSHOT:compile']
['brs:dxprog-lang:jar:3.3-SNAPSHOT:compile', 'brs:libutil:jar:2.51:compile']
['brs:dxprog-lang:jar:3.3-SNAPSHOT:compile', 'brs:libutil:jar:2.51:compile', 'commons-collections:commons-collections:jar:3.2.2:compile']
['brs:dxprog-lang:jar:3.3-SNAPSHOT:compile', 'brs:libutil:jar:2.51:compile', 'org.apache.commons:commons-collections4:jar:4.1:compile']
['brs:dxprog-lang:jar:3.3-SNAPSHOT:compile', 'brs:libutil:jar:2.51:compile', 'org.apache.commons:commons-collections4:jar:4.1:compile', 'com.fasterxml.jackson.core:jackson-annotations:jar:2.9.0:compile']
['brs:dxprog-lang:jar:3.3-SNAPSHOT:compile', 'brs:libutil:jar:2.51:compile', 'org.apache.commons:commons-collections4:jar:4.1:compile', 'com.fasterxml.jackson.core:jackson-core:jar:2.9.5:compile']

其次,您可以使用此堆栈构建基于叠层字典的树:

def create_tree(text):
    tree = {}
    for stack, name in build_stack(text):
        temp = tree
        for n in stack: # find or create...
            temp = temp.setdefault(n, {}) # ...the most inner dict
        temp[name] = {}
    return tree

from pprint import pprint
pprint(create_tree(DATA))

输出:

{'brs:dxprog-lang:jar:3.3-SNAPSHOT:compile': {'brs:libutil:jar:2.51:compile': {'commons-collections:commons-collections:jar:3.2.2:compile': {},
                                                                               'org.apache.commons:commons-collections4:jar:4.1:compile': {'com.fasterxml.jackson.core:jackson-annotations:jar:2.9.0:compile': {},
                                                                                                                                           'com.fasterxml.jackson.core:jackson-core:jar:2.9.5:compile': {}}}},
 'commons-io:commons-io:jar:1.3.2:compile': {},
 'org.antlr:antlr4:jar:4.7.1:compile': {'com.ibm.icu:icu4j:jar:58.2:compile': {},
                                        'org.antlr:antlr-runtime:jar:3.5.2:compile': {},
                                        'org.antlr:antlr4-runtime:jar:4.7.1:compile': {}}}
{'brs:dxprog-lang:jar:3.3-SNAPSHOT:compile': {'brs:libutil:jar:2.51:compile': {'commons-collections:commons-collections:jar:3.2.2:compile': {},
                                                                               'org.apache.commons:commons-collections4:jar:4.1:compile': {'com.fasterxml.jackson.core:jackson-annotations:jar:2.9.0:compile': {},
                                                                                                                                           'com.fasterxml.jackson.core:jackson-core:jar:2.9.5:compile': {}}}},
 'commons-io:commons-io:jar:1.3.2:compile': {},
 'org.antlr:antlr4:jar:4.7.1:compile': {'com.ibm.icu:icu4j:jar:58.2:compile': {},
                                        'org.antlr:antlr-runtime:jar:3.5.2:compile': {},
                                        'org.antlr:antlr4-runtime:jar:4.7.1:compile': {}}}

一个空的字典具体化了树中的一片叶子。

第三,您需要格式化树,即 1. 提取数据和 2. 将 children 分组到列表中。这是一个简单的树遍历(这里是DFS):

def format(tree):
    L = []
    for name, subtree in tree.items():
        group, artifact, packaging, version, scope = name.split(":")
        d = {"artifact":artifact} # you can add group, ...
        if subtree: # children are present
            d["children"] = format(subtree)
        L.append(d)
    return L

pprint(format(create_tree(DATA)))

输出:

[{'artifact': 'antlr4',
  'children': [{'artifact': 'antlr4-runtime'},
               {'artifact': 'antlr-runtime'},
               {'artifact': 'icu4j'}]},
 {'artifact': 'commons-io'},
 {'artifact': 'dxprog-lang',
  'children': [{'artifact': 'libutil',
                'children': [{'artifact': 'commons-collections'},
                             {'artifact': 'commons-collections4',
                              'children': [{'artifact': 'jackson-annotations'},
                                           {'artifact': 'jackson-core'}]}]}]}]

您可以将步骤分组。