我如何查找 parents 包含他们自己的代码作为 child?
How do I look for parents containing their own code as child?
我有 tree-like 数据由 Parent 代码构成,这些代码包含 Child 代码,这些代码本身可以充当 parent,具体取决于它们是否被标记作为 "SA"。此数据存在于 Excel sheet 中,如下所示:
| Tree Level (A) | Code (B) | Spec (C) | Comm. Code (D) | Parent Code (J) |
|----------------|----------|----------|----------------|-----------------|
| 1 | A12 | 1 | SA | Mach |
| 2 | B41 | 2 | SA | A12 |
| 3 | A523 | 1 | BP | B41 |
| 2 | G32 | 4 | BP | A12 |
| 2 | D3F5 | 1 | SA | A12 |
| 3 | A12 | 4 | SA | D3F5 |
| 3 | A12 | 1 | SA | D3F5 |
这里有一个问题:A12,在树的顶层 (1),包含一个 child (D3F5),它本身包含另一个 parent,与 D3F5 自己的 parent。正如您可能想象的那样,这(虽然没有在交付给我的数据中表示)创建了一个无限循环,其中树级别 3 的 A12 一次又一次地展开整个结构。
请注意,两个 'A12' children 中的一个没有问题,因为它与树级别 1 的 A12 parent 有不同的规格。
我有一个函数可以检查这种情况,但它非常慢,因为它使用嵌套循环遍历行,总行数可能有数千行。最终目标是向用户展示错误发生的最深层次。在此示例中,代码 A12
和规范 1
在树级别 3
:
def nested_parent(sht):
"""
Checks if a parent SA contains itself as a child.
:return: nested_parents: Dictionary of found 'nested parents'. None if none found
"""
nested_parents = {}
found = False
lrow = sht.Cells(sht.Rows.Count, 1).End(3).Row
parent_treelevel = 1
# Get deepest tree level, as this no longer contains children
last_treelevel = int(max([i[0] for i in sht.Range(sht.Cells(2, 1), sht.Cells(lrow, 1)).Value]))
# Loop through parent rows
print('Checking for nested parents...')
for i in range(2, lrow):
if sht.Cells(i, "D").Value == "SA":
parent_code, parent_treelevel = f'{sht.Cells(i, "B").Value}_{sht.Cells(i, "C")}', sht.Cells(i, "A").Value
# Add new key with list containing parent's tree level for parent code
if parent_code not in nested_parents:
nested_parents[parent_code] = [int(parent_treelevel)]
# Loop child rows
for j in range(i + 1, lrow + 1):
child_code, child_treelevel = f'{sht.Cells(j, "B").Value}_{sht.Cells(j, "C")}', sht.Cells(i, "A").Value
if child_code == parent_code and child_treelevel > parent_treelevel:
found = True
nested_parents[parent_code].append(int(child_treelevel))
if parent_treelevel == last_treelevel:
# End function if deepst tree level is reached
print("done")
if found:
# Delete keys that contain no information
delkeys = []
for key in reversed(nested_parents):
if len(nested_parents[key]) == 1:
delkeys.append(key)
for key in delkeys:
del nested_parents[key]
return nested_parents
else:
return
这个函数可以按如下方式调用,其中wb_name
是包含数据的工作簿的名称:
from win32com.client import GetObject
wb_name = "NAME"
sht = GetObject(None, "Excel.Application").Workbooks(wb_name).Worksheets(1)
def err(msg):
"""
stops the code from executing after printing an error message
"""
print("Unexpected error occured:", msg)
exit()
infloop = nested_parent(sht)
if infloop is not None:
dict_str = ''.join([f'Code: {key}, Tree levels: {infloop[key]}\n' for key in infloop])
err(f"Warning: one or more parent codes contain their own code as a child:\n{dict_str}")
我希望加快这段代码的速度,因为我的脚本的其余部分相当快,但它的速度受到此函数的严重阻碍。
我希望此回复有助于展示分层数据结构的强大功能。我所做的是将数据重写为 json 字符串,然后编写代码遍历层次结构并生成报告。您仍然需要将 excel 转换为 json。要点是 json 的每个级别都有相同的键,并且 children 中的每个 child 都具有与其 parent 字典相同的键,因此启用遍历结构的递归函数。我做了按代码或级别汇总的示例。
import json
json_data = """
{
"level": 0,
"code": "Mach",
"children": [
{
"level": 1,
"code": "A12",
"children": [
{
"level": 2,
"code": "B41",
"children": [
{
"level": 3,
"code": "A523",
"children": []
}
]
},
{
"level": 2,
"code": "G32",
"children": []
},
{
"level": 2,
"code": "D3F5",
"children": [
{
"level": 3,
"code": "A12",
"children": []
},
{
"level": 3,
"code": "A12",
"children": []
}
]
}
]
}
]
}
"""
data = json.loads(json_data)
def crawl_levels(mydict, result={}):
try:
result[mydict["level"]].append(mydict["code"])
except:
result[mydict["level"]] = [mydict["code"],]
for i in mydict["children"]:
result = crawl_levels(i, result=result)
return result
crawl_levels(data)
>>>{0: ['Mach'], 1: ['A12'], 2: ['B41', 'G32', 'D3F5'], 3: ['A523', 'A12', 'A12']}
def crawl_codes(mydict, result={}):
try:
result[mydict["code"]].append(mydict["level"])
except:
result[mydict["code"]] = [mydict["level"],]
for i in mydict["children"]:
result = crawl_codes(i, result=result)
return result
crawl_codes(data)
>>>{'Mach': [0],
'A12': [1, 3, 3],
'B41': [2],
'A523': [3],
'G32': [2],
'D3F5': [2]}
正如@a'r 提到的,您的 "tree like data" 可以看作是一个有向图,即用箭头(有向边)连接的点(节点)。有一个名为 networkx
的非常强大的库可以很好地处理图形。在不深入研究图论的情况下,请考虑以下代码示例:
import networkx as nx
edges = [ ('A12', 'Mach'),
('B41', 'A12'),
('A523','B41'),
('G32', 'A12'),
('D3F5','A12'),
('A12', 'D3F5'),
('A12', 'D3F5') ]
G = nx.DiGraph(edges)
cycles_list = list(nx.simple_cycles(G))
print(cycles_list)
输出:
[['A12', 'D3F5']]
这里的节点名称是您阅读时的代码本身,边是 child 和 parent 之间的连接。您只需获取 Excel 文件的相应列即可轻松创建边列表。在这种情况下,确切的方向(parent 到 child 或相反)不是很重要,只要保持一致即可。
simple_cycles
returns 一台发电机。在这里你可以找到它上面的documentation。
更新
获得循环列表后,要找到最深的节点,您需要匹配该节点并找到它最深的外观。
从 A、B 和 J 列创建一个节点列表。它看起来像:
data = [
[1, 'A12', 'Mach'],
[2, 'B41', 'A12'],
[3, 'A523', 'B41'],
[2, 'G32', 'A12'],
[2, 'D3F5', 'A12'],
[3, 'A12', 'D3F5'],
[3, 'A12', 'D3F5'] ]
result = {}
for entry in data:
for el in cycles_list:
if entry[1:] == el:
key = tuple(el)
result[key] = max(result.setdefault(key, 0), entry[0])
print(result)
>>>
{('A12', 'D3F5'): 3}
现在你会得到一个字典,其中键是有问题的节点,值是它可以找到的最深层次。
我有 tree-like 数据由 Parent 代码构成,这些代码包含 Child 代码,这些代码本身可以充当 parent,具体取决于它们是否被标记作为 "SA"。此数据存在于 Excel sheet 中,如下所示:
| Tree Level (A) | Code (B) | Spec (C) | Comm. Code (D) | Parent Code (J) |
|----------------|----------|----------|----------------|-----------------|
| 1 | A12 | 1 | SA | Mach |
| 2 | B41 | 2 | SA | A12 |
| 3 | A523 | 1 | BP | B41 |
| 2 | G32 | 4 | BP | A12 |
| 2 | D3F5 | 1 | SA | A12 |
| 3 | A12 | 4 | SA | D3F5 |
| 3 | A12 | 1 | SA | D3F5 |
这里有一个问题:A12,在树的顶层 (1),包含一个 child (D3F5),它本身包含另一个 parent,与 D3F5 自己的 parent。正如您可能想象的那样,这(虽然没有在交付给我的数据中表示)创建了一个无限循环,其中树级别 3 的 A12 一次又一次地展开整个结构。
请注意,两个 'A12' children 中的一个没有问题,因为它与树级别 1 的 A12 parent 有不同的规格。
我有一个函数可以检查这种情况,但它非常慢,因为它使用嵌套循环遍历行,总行数可能有数千行。最终目标是向用户展示错误发生的最深层次。在此示例中,代码 A12
和规范 1
在树级别 3
:
def nested_parent(sht):
"""
Checks if a parent SA contains itself as a child.
:return: nested_parents: Dictionary of found 'nested parents'. None if none found
"""
nested_parents = {}
found = False
lrow = sht.Cells(sht.Rows.Count, 1).End(3).Row
parent_treelevel = 1
# Get deepest tree level, as this no longer contains children
last_treelevel = int(max([i[0] for i in sht.Range(sht.Cells(2, 1), sht.Cells(lrow, 1)).Value]))
# Loop through parent rows
print('Checking for nested parents...')
for i in range(2, lrow):
if sht.Cells(i, "D").Value == "SA":
parent_code, parent_treelevel = f'{sht.Cells(i, "B").Value}_{sht.Cells(i, "C")}', sht.Cells(i, "A").Value
# Add new key with list containing parent's tree level for parent code
if parent_code not in nested_parents:
nested_parents[parent_code] = [int(parent_treelevel)]
# Loop child rows
for j in range(i + 1, lrow + 1):
child_code, child_treelevel = f'{sht.Cells(j, "B").Value}_{sht.Cells(j, "C")}', sht.Cells(i, "A").Value
if child_code == parent_code and child_treelevel > parent_treelevel:
found = True
nested_parents[parent_code].append(int(child_treelevel))
if parent_treelevel == last_treelevel:
# End function if deepst tree level is reached
print("done")
if found:
# Delete keys that contain no information
delkeys = []
for key in reversed(nested_parents):
if len(nested_parents[key]) == 1:
delkeys.append(key)
for key in delkeys:
del nested_parents[key]
return nested_parents
else:
return
这个函数可以按如下方式调用,其中wb_name
是包含数据的工作簿的名称:
from win32com.client import GetObject
wb_name = "NAME"
sht = GetObject(None, "Excel.Application").Workbooks(wb_name).Worksheets(1)
def err(msg):
"""
stops the code from executing after printing an error message
"""
print("Unexpected error occured:", msg)
exit()
infloop = nested_parent(sht)
if infloop is not None:
dict_str = ''.join([f'Code: {key}, Tree levels: {infloop[key]}\n' for key in infloop])
err(f"Warning: one or more parent codes contain their own code as a child:\n{dict_str}")
我希望加快这段代码的速度,因为我的脚本的其余部分相当快,但它的速度受到此函数的严重阻碍。
我希望此回复有助于展示分层数据结构的强大功能。我所做的是将数据重写为 json 字符串,然后编写代码遍历层次结构并生成报告。您仍然需要将 excel 转换为 json。要点是 json 的每个级别都有相同的键,并且 children 中的每个 child 都具有与其 parent 字典相同的键,因此启用遍历结构的递归函数。我做了按代码或级别汇总的示例。
import json
json_data = """
{
"level": 0,
"code": "Mach",
"children": [
{
"level": 1,
"code": "A12",
"children": [
{
"level": 2,
"code": "B41",
"children": [
{
"level": 3,
"code": "A523",
"children": []
}
]
},
{
"level": 2,
"code": "G32",
"children": []
},
{
"level": 2,
"code": "D3F5",
"children": [
{
"level": 3,
"code": "A12",
"children": []
},
{
"level": 3,
"code": "A12",
"children": []
}
]
}
]
}
]
}
"""
data = json.loads(json_data)
def crawl_levels(mydict, result={}):
try:
result[mydict["level"]].append(mydict["code"])
except:
result[mydict["level"]] = [mydict["code"],]
for i in mydict["children"]:
result = crawl_levels(i, result=result)
return result
crawl_levels(data)
>>>{0: ['Mach'], 1: ['A12'], 2: ['B41', 'G32', 'D3F5'], 3: ['A523', 'A12', 'A12']}
def crawl_codes(mydict, result={}):
try:
result[mydict["code"]].append(mydict["level"])
except:
result[mydict["code"]] = [mydict["level"],]
for i in mydict["children"]:
result = crawl_codes(i, result=result)
return result
crawl_codes(data)
>>>{'Mach': [0],
'A12': [1, 3, 3],
'B41': [2],
'A523': [3],
'G32': [2],
'D3F5': [2]}
正如@a'r 提到的,您的 "tree like data" 可以看作是一个有向图,即用箭头(有向边)连接的点(节点)。有一个名为 networkx
的非常强大的库可以很好地处理图形。在不深入研究图论的情况下,请考虑以下代码示例:
import networkx as nx
edges = [ ('A12', 'Mach'),
('B41', 'A12'),
('A523','B41'),
('G32', 'A12'),
('D3F5','A12'),
('A12', 'D3F5'),
('A12', 'D3F5') ]
G = nx.DiGraph(edges)
cycles_list = list(nx.simple_cycles(G))
print(cycles_list)
输出:
[['A12', 'D3F5']]
这里的节点名称是您阅读时的代码本身,边是 child 和 parent 之间的连接。您只需获取 Excel 文件的相应列即可轻松创建边列表。在这种情况下,确切的方向(parent 到 child 或相反)不是很重要,只要保持一致即可。
simple_cycles
returns 一台发电机。在这里你可以找到它上面的documentation。
更新
获得循环列表后,要找到最深的节点,您需要匹配该节点并找到它最深的外观。
从 A、B 和 J 列创建一个节点列表。它看起来像:
data = [
[1, 'A12', 'Mach'],
[2, 'B41', 'A12'],
[3, 'A523', 'B41'],
[2, 'G32', 'A12'],
[2, 'D3F5', 'A12'],
[3, 'A12', 'D3F5'],
[3, 'A12', 'D3F5'] ]
result = {}
for entry in data:
for el in cycles_list:
if entry[1:] == el:
key = tuple(el)
result[key] = max(result.setdefault(key, 0), entry[0])
print(result)
>>>
{('A12', 'D3F5'): 3}
现在你会得到一个字典,其中键是有问题的节点,值是它可以找到的最深层次。