将 DOM 树保存到图形数据库中:连接相关节点

Save DOM tree into a graph database: Connect related nodes

我正在将由 DOM 树构成的分层数据插入到图形数据库中,但是我无法获得创建关系所需的 parent 的 ID child 及其 parent 的 ID。

下面的代码说明了遍历 DOM 个节点,插入标签并获取最后插入的 id。我需要插入并获取 child 和 parent 的两个 ID 以创建它们的关系。

from lxml import HTML
import age  # from AgensGraph
from age.gen.ageParser import *

GRAPH_NAME = "demo_graph"
DSN = "host=localhost port=5432 dbname=demodb user=userdemo 
password=demo234"

ag = age.connect(graph=GRAPH_NAME, dsn=DSN)
tree = html.parse("demo.html")
for element in tree.getiterator():
    if parent := element.getparent():        
        parent = None
        cursor = ag.execCypher("CREATE (t:node {name: %s} ) RETURN t", params=(element.tag))        
        b = [x[0].id for x in cursor]  # get last inserted ID 
        print(b[0])        
        ag.execCypher("MATCH (c:node), (p:node) WHERE c.id = %s AND p.id = %s CREATE (a)-[r:connects}]->(b)") # Match child node 'c', parent node: p and join  C Connects P (P is unknown)

这是演示文件:demo.html

<!doctype html>
<html lang="en">
  <head>
    <meta charset="UTF-8"/>
    <title>Document</title>
  </head>
  <body>
    <ul class="menu">
      <div class="itm">home</div>
      <div class="itm">About us</div>
      <div class="itm">Contact us</div>
    </ul>
    <div id="idone" class="classone">
      <li class="item1">First</li>
      <li class="item2">Second</li>
      <li class="item3">Third</li>
      <div id="innerone"><h1>This Title</h1></div>
      <div id="innertwo"><h2>Subheads</h2></div>      
    </div>
    <div id="second" class="below">
      <div class="inner">
        <h1>welcome</h1>
        <h1>another</h1>
        <h2>third</h2>
      </div>
    </div>
  </body>
</html>

这是提取的 DOM 树:

tag: head attrib: None parent: html
tag: meta attrib: ('charset', 'UTF-8') parent: head
tag: title attrib: None parent: head
tag: body attrib: None parent: html
tag: h1 attrib: None parent: div
tag: h1 attrib: None parent: div
tag: h2 attrib: None parent: div
/tmp/ipykernel_27254/2858024143.py:4: FutureWarning: The behavior of this method will change in future versions. Use specific 'len(elem)' or 'elem is not None' test instead.
  if parent := element.getparent():

执行CREATE语句在提交会话后生效。 你应该在 execCypher(...)

之后 commit()
cursor = ag.execCypher("CREATE (t:node {name: %s} ) RETURN t", params=(element.tag))        
b = [x[0].id for x in cursor]
ag.commit()

尝试以下代码:

ag = age.connect(graph=GRAPH_NAME, dsn=DSN)
tree = html.parse("demo.html")
for element in tree.getiterator():
    if parent := element.getparent():        
        parent = None
        cursor = ag.execCypher("CREATE (t:node {name: %s} ) RETURN t", params=(element.tag))        
        b = [x[0].id for x in cursor]  # get last inserted ID 
        ag.commit()
        print(b[0])        
        ag.execCypher("MATCH (c:node), (p:node) WHERE c.id = %s AND p.id = %s CREATE (a)-[r:connects}]->(b)") # Match child node 'c', parent node: p and join  C Connects P (P is unknown)