将 DOM 树保存到图形数据库中:连接相关节点
Save DOM tree into a graph database: Connect related nodes
我正在将由 DOM 树构成的分层数据插入到图形数据库中,但是我无法获得创建关系所需的 parent 的 ID child 及其 parent 的 ID。
下面的代码说明了遍历 DOM 个节点,插入标签并获取最后插入的 id。我需要插入并获取 child 和 parent 的两个 ID 以创建它们的关系。
from lxml import HTML
import age # from AgensGraph
from age.gen.ageParser import *
GRAPH_NAME = "demo_graph"
DSN = "host=localhost port=5432 dbname=demodb user=userdemo
password=demo234"
ag = age.connect(graph=GRAPH_NAME, dsn=DSN)
tree = html.parse("demo.html")
for element in tree.getiterator():
if parent := element.getparent():
parent = None
cursor = ag.execCypher("CREATE (t:node {name: %s} ) RETURN t", params=(element.tag))
b = [x[0].id for x in cursor] # get last inserted ID
print(b[0])
ag.execCypher("MATCH (c:node), (p:node) WHERE c.id = %s AND p.id = %s CREATE (a)-[r:connects}]->(b)") # Match child node 'c', parent node: p and join C Connects P (P is unknown)
这是演示文件:demo.html
<!doctype html>
<html lang="en">
<head>
<meta charset="UTF-8"/>
<title>Document</title>
</head>
<body>
<ul class="menu">
<div class="itm">home</div>
<div class="itm">About us</div>
<div class="itm">Contact us</div>
</ul>
<div id="idone" class="classone">
<li class="item1">First</li>
<li class="item2">Second</li>
<li class="item3">Third</li>
<div id="innerone"><h1>This Title</h1></div>
<div id="innertwo"><h2>Subheads</h2></div>
</div>
<div id="second" class="below">
<div class="inner">
<h1>welcome</h1>
<h1>another</h1>
<h2>third</h2>
</div>
</div>
</body>
</html>
这是提取的 DOM 树:
tag: head attrib: None parent: html
tag: meta attrib: ('charset', 'UTF-8') parent: head
tag: title attrib: None parent: head
tag: body attrib: None parent: html
tag: h1 attrib: None parent: div
tag: h1 attrib: None parent: div
tag: h2 attrib: None parent: div
/tmp/ipykernel_27254/2858024143.py:4: FutureWarning: The behavior of this method will change in future versions. Use specific 'len(elem)' or 'elem is not None' test instead.
if parent := element.getparent():
执行CREATE语句在提交会话后生效。
你应该在 execCypher(...)
之后 commit()
cursor = ag.execCypher("CREATE (t:node {name: %s} ) RETURN t", params=(element.tag))
b = [x[0].id for x in cursor]
ag.commit()
尝试以下代码:
ag = age.connect(graph=GRAPH_NAME, dsn=DSN)
tree = html.parse("demo.html")
for element in tree.getiterator():
if parent := element.getparent():
parent = None
cursor = ag.execCypher("CREATE (t:node {name: %s} ) RETURN t", params=(element.tag))
b = [x[0].id for x in cursor] # get last inserted ID
ag.commit()
print(b[0])
ag.execCypher("MATCH (c:node), (p:node) WHERE c.id = %s AND p.id = %s CREATE (a)-[r:connects}]->(b)") # Match child node 'c', parent node: p and join C Connects P (P is unknown)
我正在将由 DOM 树构成的分层数据插入到图形数据库中,但是我无法获得创建关系所需的 parent 的 ID child 及其 parent 的 ID。
下面的代码说明了遍历 DOM 个节点,插入标签并获取最后插入的 id。我需要插入并获取 child 和 parent 的两个 ID 以创建它们的关系。
from lxml import HTML
import age # from AgensGraph
from age.gen.ageParser import *
GRAPH_NAME = "demo_graph"
DSN = "host=localhost port=5432 dbname=demodb user=userdemo
password=demo234"
ag = age.connect(graph=GRAPH_NAME, dsn=DSN)
tree = html.parse("demo.html")
for element in tree.getiterator():
if parent := element.getparent():
parent = None
cursor = ag.execCypher("CREATE (t:node {name: %s} ) RETURN t", params=(element.tag))
b = [x[0].id for x in cursor] # get last inserted ID
print(b[0])
ag.execCypher("MATCH (c:node), (p:node) WHERE c.id = %s AND p.id = %s CREATE (a)-[r:connects}]->(b)") # Match child node 'c', parent node: p and join C Connects P (P is unknown)
这是演示文件:demo.html
<!doctype html>
<html lang="en">
<head>
<meta charset="UTF-8"/>
<title>Document</title>
</head>
<body>
<ul class="menu">
<div class="itm">home</div>
<div class="itm">About us</div>
<div class="itm">Contact us</div>
</ul>
<div id="idone" class="classone">
<li class="item1">First</li>
<li class="item2">Second</li>
<li class="item3">Third</li>
<div id="innerone"><h1>This Title</h1></div>
<div id="innertwo"><h2>Subheads</h2></div>
</div>
<div id="second" class="below">
<div class="inner">
<h1>welcome</h1>
<h1>another</h1>
<h2>third</h2>
</div>
</div>
</body>
</html>
这是提取的 DOM 树:
tag: head attrib: None parent: html
tag: meta attrib: ('charset', 'UTF-8') parent: head
tag: title attrib: None parent: head
tag: body attrib: None parent: html
tag: h1 attrib: None parent: div
tag: h1 attrib: None parent: div
tag: h2 attrib: None parent: div
/tmp/ipykernel_27254/2858024143.py:4: FutureWarning: The behavior of this method will change in future versions. Use specific 'len(elem)' or 'elem is not None' test instead.
if parent := element.getparent():
执行CREATE语句在提交会话后生效。 你应该在 execCypher(...)
之后 commit()cursor = ag.execCypher("CREATE (t:node {name: %s} ) RETURN t", params=(element.tag))
b = [x[0].id for x in cursor]
ag.commit()
尝试以下代码:
ag = age.connect(graph=GRAPH_NAME, dsn=DSN)
tree = html.parse("demo.html")
for element in tree.getiterator():
if parent := element.getparent():
parent = None
cursor = ag.execCypher("CREATE (t:node {name: %s} ) RETURN t", params=(element.tag))
b = [x[0].id for x in cursor] # get last inserted ID
ag.commit()
print(b[0])
ag.execCypher("MATCH (c:node), (p:node) WHERE c.id = %s AND p.id = %s CREATE (a)-[r:connects}]->(b)") # Match child node 'c', parent node: p and join C Connects P (P is unknown)