添加基于报告的层级

Add Hierarchic level based on reporting

数据 df

child parent
b     a
c     a
d     b
e     c
f     c
g     f

输出:

child   parent  level
b       a       1
c       a       1
d       b       2
e       c       2
f       c       2
g       f       3

根据此父子报告,'a' 是主要父项,因为它不向任何人报告。 'b' 和 'c' 向 'a' 报告,因此它们是级别 = 1。'd' 和 'e' 向级别 1 (b,c) 报告,因此它们是级别=2。 'g' 向 'f'(级别 2)报告,因此 'g' 的级别 = 3。请让我知道如何实现这个

我正在尝试下面的代码,但它不起作用

df['Level'] = np.where(df['parent'] == 'a',"level 1",np.nan)
dfm1 = pd.Series(np.where(df['Level'] == 'level 1', df['parent'],None))
df.loc[df['parent'].isin(dfm1),'Level'] = "level 2"

这是一种使用 networkx 的方法,我们可以在其中找到没有祖先并获得相同长度的

import networkx as nx

G = nx.from_pandas_edgelist(df,"parent","child",create_using=nx.DiGraph())
f = lambda x: len(nx.ancestors(G,x))
df['level'] = df['child'].map(f)

print(df)

  child parent  level
0     b      a      1
1     c      a      1
2     d      b      2
3     e      c      2
4     f      c      2
5     g      f      3

这是一个基于第一原理的解决方案:

# We will build the tree of relationships, using a helper node class
class Node:
    def __init__(self, value, parent=None, level=0):
        self.value = value
        self.parent = parent
        self.level = level
        self.children = []
    
    def set_child(self, child):
        child.level = self.level + 1
        self.children.append(child)

# Helper function to insert nodes
def insert(node, new_node):
    if new_node.parent == node.value:
        # if the new node is a child, insert it
        node.set_child(new_node)
    else:
        # otherwise, iterate over the children until you find its parent
        if node.children:
            for child in node.children:
                insert(child, new_node)

# gather the level information for the tree
def node_print(node, values=[]):
    if node.parent:
        values.append((node.value, node.parent, node.level))
    for child in node.children:
        values = node_print(child, values=values)
    return values

# Now get the data and build the tree
data = """b     a
c     a
d     b
e     c
f     c
g     f"""


rows = [y.split() for y in data.split("\n")]

for index, (child, parent) in enumerate(rows):
    if index == 0:
        node = Node(value=parent)
    
    child_node = Node(value=child, parent=parent)
    insert(node, child_node)

output = pd.DataFrame(data=node_print(node, values=[]), columns=['child', 'parent', 'level']).sort_values(by='level')

print(output)

  child parent  level
0     b      a      1
2     c      a      1
1     d      b      2
3     e      c      2
4     f      c      2
5     g      f      3