python - 解析 Maven 依赖树
python - parse maven dependency tree
我希望能够将 Maven 依赖关系树作为输入并对其进行解析以确定每个依赖关系的 groupId、artifactId 和版本及其 child(ren)(如果有),以及 child(ren) 的 groupId、artifactId 和版本(以及任何其他 child(ren) 等)。
我不确定在为 neo4j 准备数据之前解析 mvn 依赖树并将信息存储为嵌套字典是否最有意义。
我也不确定解析整个 mvn 依赖关系树的最佳方法。下面的代码是我在尝试解析、删除前面不必要的信息并将某些内容标记为 child 或 parent.
方面取得的最大进展
tree=
[INFO] +- org.antlr:antlr4:jar:4.7.1:compile
[INFO] | +- org.antlr:antlr4-runtime:jar:4.7.1:compile
[INFO] | +- org.antlr:antlr-runtime:jar:3.5.2:compile
[INFO] | \- com.ibm.icu:icu4j:jar:58.2:compile
[INFO] +- commons-io:commons-io:jar:1.3.2:compile
[INFO] +- brs:dxprog-lang:jar:3.3-SNAPSHOT:compile
[INFO] | +- brs:libutil:jar:2.51:compile
[INFO] | | +- commons-collections:commons-collections:jar:3.2.2:compile
[INFO] | | +- org.apache.commons:commons-collections4:jar:4.1:compile
[INFO] | | | +- com.fasterxml.jackson.core:jackson-annotations:jar:2.9.0:compile
[INFO] | | | \- com.fasterxml.jackson.core:jackson-core:jar:2.9.5:compile
.
.
.
fileObj = open("tree", "r")
for line in fileObj.readlines():
for word in line.split():
if "[INFO]" in line.split():
line = line.replace(line.split().__getitem__(0), "")
print(line)
if "|" in line.split():
line = line.replace(line.split().__getitem__(0), "child")
print(line)
if "+-" in line.split() and "|" not in line.split():
line = line.replace(line.split().__getitem__(0), "")
line = line.replace(line.split().__getitem__(0), "parent")
print(line, '\n\n')
输出:
| | \- com.google.protobuf:protobuf-java:jar:3.5.1:compile
child child \- com.google.protobuf:protobuf-java:jar:3.5.1:compile
| +- com.h2database:h2:jar:1.4.195:compile
child +- com.h2database:h2:jar:1.4.195:compile
parent com.h2database:h2:jar:1.4.195:compile
鉴于我对 python 的功能相对不熟悉,如果您能以有组织的方式分析和 return 数据的最佳方式,我将不胜感激。提前致谢!
我不知道您的编程经验如何,但这不是一项微不足道的任务。
首先,您可以看到依赖的层叠级别由符号|
具体化。您可以做的最简单的事情是构建一个堆栈,用于存储从根到 children、grandchildren、...:[=17=] 的依赖路径
def build_stack(text):
stack = []
for line in text.split("\n"):
if not line:
continue
line = line[7:] # remove [INFO]
level = line.count("|")
name = line.split("-", 1)[1].strip() # the part after the -
stack = stack[:level] + [name] # update the stack: everything up to level-1 and name
yield stack[:level], name # this is a generator
for bottom_stack, name in build_stack(DATA):
print (bottom_stack + [name])
输出:
['org.antlr:antlr4:jar:4.7.1:compile']
['org.antlr:antlr4:jar:4.7.1:compile', 'org.antlr:antlr4-runtime:jar:4.7.1:compile']
['org.antlr:antlr4:jar:4.7.1:compile', 'org.antlr:antlr-runtime:jar:3.5.2:compile']
['org.antlr:antlr4:jar:4.7.1:compile', 'com.ibm.icu:icu4j:jar:58.2:compile']
['commons-io:commons-io:jar:1.3.2:compile']
['brs:dxprog-lang:jar:3.3-SNAPSHOT:compile']
['brs:dxprog-lang:jar:3.3-SNAPSHOT:compile', 'brs:libutil:jar:2.51:compile']
['brs:dxprog-lang:jar:3.3-SNAPSHOT:compile', 'brs:libutil:jar:2.51:compile', 'commons-collections:commons-collections:jar:3.2.2:compile']
['brs:dxprog-lang:jar:3.3-SNAPSHOT:compile', 'brs:libutil:jar:2.51:compile', 'org.apache.commons:commons-collections4:jar:4.1:compile']
['brs:dxprog-lang:jar:3.3-SNAPSHOT:compile', 'brs:libutil:jar:2.51:compile', 'org.apache.commons:commons-collections4:jar:4.1:compile', 'com.fasterxml.jackson.core:jackson-annotations:jar:2.9.0:compile']
['brs:dxprog-lang:jar:3.3-SNAPSHOT:compile', 'brs:libutil:jar:2.51:compile', 'org.apache.commons:commons-collections4:jar:4.1:compile', 'com.fasterxml.jackson.core:jackson-core:jar:2.9.5:compile']
其次,您可以使用此堆栈构建基于叠层字典的树:
def create_tree(text):
tree = {}
for stack, name in build_stack(text):
temp = tree
for n in stack: # find or create...
temp = temp.setdefault(n, {}) # ...the most inner dict
temp[name] = {}
return tree
from pprint import pprint
pprint(create_tree(DATA))
输出:
{'brs:dxprog-lang:jar:3.3-SNAPSHOT:compile': {'brs:libutil:jar:2.51:compile': {'commons-collections:commons-collections:jar:3.2.2:compile': {},
'org.apache.commons:commons-collections4:jar:4.1:compile': {'com.fasterxml.jackson.core:jackson-annotations:jar:2.9.0:compile': {},
'com.fasterxml.jackson.core:jackson-core:jar:2.9.5:compile': {}}}},
'commons-io:commons-io:jar:1.3.2:compile': {},
'org.antlr:antlr4:jar:4.7.1:compile': {'com.ibm.icu:icu4j:jar:58.2:compile': {},
'org.antlr:antlr-runtime:jar:3.5.2:compile': {},
'org.antlr:antlr4-runtime:jar:4.7.1:compile': {}}}
{'brs:dxprog-lang:jar:3.3-SNAPSHOT:compile': {'brs:libutil:jar:2.51:compile': {'commons-collections:commons-collections:jar:3.2.2:compile': {},
'org.apache.commons:commons-collections4:jar:4.1:compile': {'com.fasterxml.jackson.core:jackson-annotations:jar:2.9.0:compile': {},
'com.fasterxml.jackson.core:jackson-core:jar:2.9.5:compile': {}}}},
'commons-io:commons-io:jar:1.3.2:compile': {},
'org.antlr:antlr4:jar:4.7.1:compile': {'com.ibm.icu:icu4j:jar:58.2:compile': {},
'org.antlr:antlr-runtime:jar:3.5.2:compile': {},
'org.antlr:antlr4-runtime:jar:4.7.1:compile': {}}}
一个空的字典具体化了树中的一片叶子。
第三,您需要格式化树,即 1. 提取数据和 2. 将 children 分组到列表中。这是一个简单的树遍历(这里是DFS):
def format(tree):
L = []
for name, subtree in tree.items():
group, artifact, packaging, version, scope = name.split(":")
d = {"artifact":artifact} # you can add group, ...
if subtree: # children are present
d["children"] = format(subtree)
L.append(d)
return L
pprint(format(create_tree(DATA)))
输出:
[{'artifact': 'antlr4',
'children': [{'artifact': 'antlr4-runtime'},
{'artifact': 'antlr-runtime'},
{'artifact': 'icu4j'}]},
{'artifact': 'commons-io'},
{'artifact': 'dxprog-lang',
'children': [{'artifact': 'libutil',
'children': [{'artifact': 'commons-collections'},
{'artifact': 'commons-collections4',
'children': [{'artifact': 'jackson-annotations'},
{'artifact': 'jackson-core'}]}]}]}]
您可以将步骤分组。
我希望能够将 Maven 依赖关系树作为输入并对其进行解析以确定每个依赖关系的 groupId、artifactId 和版本及其 child(ren)(如果有),以及 child(ren) 的 groupId、artifactId 和版本(以及任何其他 child(ren) 等)。 我不确定在为 neo4j 准备数据之前解析 mvn 依赖树并将信息存储为嵌套字典是否最有意义。
我也不确定解析整个 mvn 依赖关系树的最佳方法。下面的代码是我在尝试解析、删除前面不必要的信息并将某些内容标记为 child 或 parent.
方面取得的最大进展tree=
[INFO] +- org.antlr:antlr4:jar:4.7.1:compile
[INFO] | +- org.antlr:antlr4-runtime:jar:4.7.1:compile
[INFO] | +- org.antlr:antlr-runtime:jar:3.5.2:compile
[INFO] | \- com.ibm.icu:icu4j:jar:58.2:compile
[INFO] +- commons-io:commons-io:jar:1.3.2:compile
[INFO] +- brs:dxprog-lang:jar:3.3-SNAPSHOT:compile
[INFO] | +- brs:libutil:jar:2.51:compile
[INFO] | | +- commons-collections:commons-collections:jar:3.2.2:compile
[INFO] | | +- org.apache.commons:commons-collections4:jar:4.1:compile
[INFO] | | | +- com.fasterxml.jackson.core:jackson-annotations:jar:2.9.0:compile
[INFO] | | | \- com.fasterxml.jackson.core:jackson-core:jar:2.9.5:compile
.
.
.
fileObj = open("tree", "r")
for line in fileObj.readlines():
for word in line.split():
if "[INFO]" in line.split():
line = line.replace(line.split().__getitem__(0), "")
print(line)
if "|" in line.split():
line = line.replace(line.split().__getitem__(0), "child")
print(line)
if "+-" in line.split() and "|" not in line.split():
line = line.replace(line.split().__getitem__(0), "")
line = line.replace(line.split().__getitem__(0), "parent")
print(line, '\n\n')
输出:
| | \- com.google.protobuf:protobuf-java:jar:3.5.1:compile
child child \- com.google.protobuf:protobuf-java:jar:3.5.1:compile
| +- com.h2database:h2:jar:1.4.195:compile
child +- com.h2database:h2:jar:1.4.195:compile
parent com.h2database:h2:jar:1.4.195:compile
鉴于我对 python 的功能相对不熟悉,如果您能以有组织的方式分析和 return 数据的最佳方式,我将不胜感激。提前致谢!
我不知道您的编程经验如何,但这不是一项微不足道的任务。
首先,您可以看到依赖的层叠级别由符号|
具体化。您可以做的最简单的事情是构建一个堆栈,用于存储从根到 children、grandchildren、...:[=17=] 的依赖路径
def build_stack(text):
stack = []
for line in text.split("\n"):
if not line:
continue
line = line[7:] # remove [INFO]
level = line.count("|")
name = line.split("-", 1)[1].strip() # the part after the -
stack = stack[:level] + [name] # update the stack: everything up to level-1 and name
yield stack[:level], name # this is a generator
for bottom_stack, name in build_stack(DATA):
print (bottom_stack + [name])
输出:
['org.antlr:antlr4:jar:4.7.1:compile']
['org.antlr:antlr4:jar:4.7.1:compile', 'org.antlr:antlr4-runtime:jar:4.7.1:compile']
['org.antlr:antlr4:jar:4.7.1:compile', 'org.antlr:antlr-runtime:jar:3.5.2:compile']
['org.antlr:antlr4:jar:4.7.1:compile', 'com.ibm.icu:icu4j:jar:58.2:compile']
['commons-io:commons-io:jar:1.3.2:compile']
['brs:dxprog-lang:jar:3.3-SNAPSHOT:compile']
['brs:dxprog-lang:jar:3.3-SNAPSHOT:compile', 'brs:libutil:jar:2.51:compile']
['brs:dxprog-lang:jar:3.3-SNAPSHOT:compile', 'brs:libutil:jar:2.51:compile', 'commons-collections:commons-collections:jar:3.2.2:compile']
['brs:dxprog-lang:jar:3.3-SNAPSHOT:compile', 'brs:libutil:jar:2.51:compile', 'org.apache.commons:commons-collections4:jar:4.1:compile']
['brs:dxprog-lang:jar:3.3-SNAPSHOT:compile', 'brs:libutil:jar:2.51:compile', 'org.apache.commons:commons-collections4:jar:4.1:compile', 'com.fasterxml.jackson.core:jackson-annotations:jar:2.9.0:compile']
['brs:dxprog-lang:jar:3.3-SNAPSHOT:compile', 'brs:libutil:jar:2.51:compile', 'org.apache.commons:commons-collections4:jar:4.1:compile', 'com.fasterxml.jackson.core:jackson-core:jar:2.9.5:compile']
其次,您可以使用此堆栈构建基于叠层字典的树:
def create_tree(text):
tree = {}
for stack, name in build_stack(text):
temp = tree
for n in stack: # find or create...
temp = temp.setdefault(n, {}) # ...the most inner dict
temp[name] = {}
return tree
from pprint import pprint
pprint(create_tree(DATA))
输出:
{'brs:dxprog-lang:jar:3.3-SNAPSHOT:compile': {'brs:libutil:jar:2.51:compile': {'commons-collections:commons-collections:jar:3.2.2:compile': {},
'org.apache.commons:commons-collections4:jar:4.1:compile': {'com.fasterxml.jackson.core:jackson-annotations:jar:2.9.0:compile': {},
'com.fasterxml.jackson.core:jackson-core:jar:2.9.5:compile': {}}}},
'commons-io:commons-io:jar:1.3.2:compile': {},
'org.antlr:antlr4:jar:4.7.1:compile': {'com.ibm.icu:icu4j:jar:58.2:compile': {},
'org.antlr:antlr-runtime:jar:3.5.2:compile': {},
'org.antlr:antlr4-runtime:jar:4.7.1:compile': {}}}
{'brs:dxprog-lang:jar:3.3-SNAPSHOT:compile': {'brs:libutil:jar:2.51:compile': {'commons-collections:commons-collections:jar:3.2.2:compile': {},
'org.apache.commons:commons-collections4:jar:4.1:compile': {'com.fasterxml.jackson.core:jackson-annotations:jar:2.9.0:compile': {},
'com.fasterxml.jackson.core:jackson-core:jar:2.9.5:compile': {}}}},
'commons-io:commons-io:jar:1.3.2:compile': {},
'org.antlr:antlr4:jar:4.7.1:compile': {'com.ibm.icu:icu4j:jar:58.2:compile': {},
'org.antlr:antlr-runtime:jar:3.5.2:compile': {},
'org.antlr:antlr4-runtime:jar:4.7.1:compile': {}}}
一个空的字典具体化了树中的一片叶子。
第三,您需要格式化树,即 1. 提取数据和 2. 将 children 分组到列表中。这是一个简单的树遍历(这里是DFS):
def format(tree):
L = []
for name, subtree in tree.items():
group, artifact, packaging, version, scope = name.split(":")
d = {"artifact":artifact} # you can add group, ...
if subtree: # children are present
d["children"] = format(subtree)
L.append(d)
return L
pprint(format(create_tree(DATA)))
输出:
[{'artifact': 'antlr4',
'children': [{'artifact': 'antlr4-runtime'},
{'artifact': 'antlr-runtime'},
{'artifact': 'icu4j'}]},
{'artifact': 'commons-io'},
{'artifact': 'dxprog-lang',
'children': [{'artifact': 'libutil',
'children': [{'artifact': 'commons-collections'},
{'artifact': 'commons-collections4',
'children': [{'artifact': 'jackson-annotations'},
{'artifact': 'jackson-core'}]}]}]}]
您可以将步骤分组。