Return 从字典生成的最小值 "sub-DAG"
Return minimum "sub-DAG" generated from dictionary
我有一个输入数据和一些转换函数 t1、t2、t3、t4、t5、t6。他们每个人都需要一些列作为输入并输出一些列。
details = {
't1': {
input_cols = ['C1', 'C2'],
output_cols = ['C8', 'C9', 'C10']
}
't2': {
input_cols = ['C8', 'C10'],
output_cols = ['C11', 'C12']
}
't3': {
input_cols = ['C11', 'C12'],
output_cols = ['C13', 'C14', 'C15', 'C16']
},
't4': {
input_cols = ['C9', 'C3', 'C4', 'C5', 'C6'],
output_cols = ['C17', 'C18', 'C19', 'C20']
},
't5': {
input_cols = ['C20', 'C16'],
output_cols = ['C21', 'C22']
},
't6': {
input_cols = ['C7', 'C16'],
output_cols = ['C23', 'C24', 'C25']
}
}
与这些转换关联的 DAG 是
我将要生成的列作为输入,我应该 return 获取它所需的子 DAG(我不只对 DAG 节点感兴趣,顺序也很重要)
例如,
如果 required_columns
是
['C1', 'C2,', ..., 'C25']
,输出应该是
['t1', 't2', 't3', 't5', 't4', 't6']
或其他可能的路径,例如 ['t1', 't4', 't2', 't6', 't3', 't5']
如果 required_columns
是 ['C8'、'C9'、'C10'、'C11'、'C12'、'C13', 'C114', 'C15', 'C16', 'C20', 'C21', 'C22'] 我应该输出 ['t1', 't2', 't3', 't5']
如果 required_columns
是 ['C1'、'C3'、'C5'、'C8'、'C9'、'C10', 'C11', 'C12', 'C17', 'C18', 'C19', 'C20'] 我应该输出 ['t1', 't4']
请注意,'C1' ... 'C7' 在我的 required_columns
中的存在/不存在不会影响所需的输出,因为这些列不是由 t1、t2 生成的, t3, t4, t5, t6
我解决这个问题的方法是:
- 基于
details
字典创建DAG D
- 对于 D 中的每片叶子
如果当前叶子的所有
output_cols
都不在required_columns
中 -> 删除节点并向上移动到指向该节点的节点并重复步骤
- 打印现有拓扑排序修剪 DAG 的值
我直觉上认为我的方法远非最佳。鉴于当前的输入数据格式(details
是一个字典,required_columns
是一个列表),我怎样才能做得更好?
构建转置 DAG 并从所需列进行深度优先搜索?
import collections
Transformation = collections.namedtuple("Transformation", ["input_cols", "output_cols"])
details = {
"t1": Transformation(input_cols=["C1", "C2"], output_cols=["C8", "C9", "C10"]),
"t2": Transformation(input_cols=["C8", "C10"], output_cols=["C11", "C12"]),
"t3": Transformation(
input_cols=["C11", "C12"], output_cols=["C13", "C14", "C15", "C16"]
),
"t4": Transformation(
input_cols=["C9", "C3", "C4", "C5", "C6"],
output_cols=["C17", "C18", "C19", "C20"],
),
"t5": Transformation(input_cols=["C20", "C16"], output_cols=["C21", "C22"]),
"t6": Transformation(input_cols=["C7", "C16"], output_cols=["C23", "C24", "C25"]),
}
def depth_first_search(graph, roots):
stack = [(root, True) for root in roots]
visited = set()
order = []
while stack:
v, entering = stack.pop()
if entering:
if v in visited:
continue
visited.add(v)
stack.append((v, False))
stack.extend((w, True) for w in graph[v])
else:
order.append(v)
return order
def sub_dag(required_columns):
graph = collections.defaultdict(set)
for k, v in details.items():
for c in v.input_cols:
graph[k].add(c)
for c in v.output_cols:
graph[c].add(k)
order = depth_first_search(graph, required_columns)
return [t for t in order if t in details]
print(
sub_dag(
[
"C1",
"C2",
"C3",
"C4",
"C5",
"C6",
"C7",
"C8",
"C9",
"C10",
"C11",
"C12",
"C13",
"C14",
"C15",
"C16",
"C17",
"C18",
"C19",
"C20",
"C21",
"C22",
"C23",
"C24",
"C25",
]
)
)
print(
sub_dag(
[
"C8",
"C9",
"C10",
"C11",
"C12",
"C13",
"C14",
"C15",
"C16",
"C20",
"C21",
"C22",
]
)
)
print(
sub_dag(
[
"C1",
"C3",
"C5",
"C8",
"C9",
"C10",
"C11",
"C12",
"C17",
"C18",
"C19",
"C20",
]
)
)
print(sub_dag(["C1", "C3", "C5", "C25", "C22"]))
我有一个输入数据和一些转换函数 t1、t2、t3、t4、t5、t6。他们每个人都需要一些列作为输入并输出一些列。
details = {
't1': {
input_cols = ['C1', 'C2'],
output_cols = ['C8', 'C9', 'C10']
}
't2': {
input_cols = ['C8', 'C10'],
output_cols = ['C11', 'C12']
}
't3': {
input_cols = ['C11', 'C12'],
output_cols = ['C13', 'C14', 'C15', 'C16']
},
't4': {
input_cols = ['C9', 'C3', 'C4', 'C5', 'C6'],
output_cols = ['C17', 'C18', 'C19', 'C20']
},
't5': {
input_cols = ['C20', 'C16'],
output_cols = ['C21', 'C22']
},
't6': {
input_cols = ['C7', 'C16'],
output_cols = ['C23', 'C24', 'C25']
}
}
与这些转换关联的 DAG 是
我将要生成的列作为输入,我应该 return 获取它所需的子 DAG(我不只对 DAG 节点感兴趣,顺序也很重要) 例如,
如果 required_columns
是
['C1', 'C2,', ..., 'C25']
,输出应该是
['t1', 't2', 't3', 't5', 't4', 't6']
或其他可能的路径,例如 ['t1', 't4', 't2', 't6', 't3', 't5']
如果 required_columns
是 ['C8'、'C9'、'C10'、'C11'、'C12'、'C13', 'C114', 'C15', 'C16', 'C20', 'C21', 'C22'] 我应该输出 ['t1', 't2', 't3', 't5']
如果 required_columns
是 ['C1'、'C3'、'C5'、'C8'、'C9'、'C10', 'C11', 'C12', 'C17', 'C18', 'C19', 'C20'] 我应该输出 ['t1', 't4']
请注意,'C1' ... 'C7' 在我的 required_columns
中的存在/不存在不会影响所需的输出,因为这些列不是由 t1、t2 生成的, t3, t4, t5, t6
我解决这个问题的方法是:
- 基于
details
字典创建DAG D - 对于 D 中的每片叶子
如果当前叶子的所有
output_cols
都不在required_columns
中 -> 删除节点并向上移动到指向该节点的节点并重复步骤 - 打印现有拓扑排序修剪 DAG 的值
我直觉上认为我的方法远非最佳。鉴于当前的输入数据格式(details
是一个字典,required_columns
是一个列表),我怎样才能做得更好?
构建转置 DAG 并从所需列进行深度优先搜索?
import collections
Transformation = collections.namedtuple("Transformation", ["input_cols", "output_cols"])
details = {
"t1": Transformation(input_cols=["C1", "C2"], output_cols=["C8", "C9", "C10"]),
"t2": Transformation(input_cols=["C8", "C10"], output_cols=["C11", "C12"]),
"t3": Transformation(
input_cols=["C11", "C12"], output_cols=["C13", "C14", "C15", "C16"]
),
"t4": Transformation(
input_cols=["C9", "C3", "C4", "C5", "C6"],
output_cols=["C17", "C18", "C19", "C20"],
),
"t5": Transformation(input_cols=["C20", "C16"], output_cols=["C21", "C22"]),
"t6": Transformation(input_cols=["C7", "C16"], output_cols=["C23", "C24", "C25"]),
}
def depth_first_search(graph, roots):
stack = [(root, True) for root in roots]
visited = set()
order = []
while stack:
v, entering = stack.pop()
if entering:
if v in visited:
continue
visited.add(v)
stack.append((v, False))
stack.extend((w, True) for w in graph[v])
else:
order.append(v)
return order
def sub_dag(required_columns):
graph = collections.defaultdict(set)
for k, v in details.items():
for c in v.input_cols:
graph[k].add(c)
for c in v.output_cols:
graph[c].add(k)
order = depth_first_search(graph, required_columns)
return [t for t in order if t in details]
print(
sub_dag(
[
"C1",
"C2",
"C3",
"C4",
"C5",
"C6",
"C7",
"C8",
"C9",
"C10",
"C11",
"C12",
"C13",
"C14",
"C15",
"C16",
"C17",
"C18",
"C19",
"C20",
"C21",
"C22",
"C23",
"C24",
"C25",
]
)
)
print(
sub_dag(
[
"C8",
"C9",
"C10",
"C11",
"C12",
"C13",
"C14",
"C15",
"C16",
"C20",
"C21",
"C22",
]
)
)
print(
sub_dag(
[
"C1",
"C3",
"C5",
"C8",
"C9",
"C10",
"C11",
"C12",
"C17",
"C18",
"C19",
"C20",
]
)
)
print(sub_dag(["C1", "C3", "C5", "C25", "C22"]))