neo4j CYPHER - 在 ON MATCH SET 条件下创建新节点
neo4j CYPHER - at ON MATCH SET create new nodes on condition
要将 XML 数据导入 neo4j 数据库,我首先将 XML 解析为 python 字典,然后使用 CYPHER 查询:
WITH $pubmed_dict as pubmed_article
UNWIND pubmed_article as particle
...
FOREACH (author IN particle.MedlineCitation.Article.AuthorList.Author |
MERGE (a:Author {last_name: COALESCE(author.LastName, 'LAST NAME MISSING!')})
ON CREATE SET a.first_name = author.ForeName, a.affiliation = author.AffiliationInfo.Affiliation
ON MATCH SET a.first_name = author.ForeName, a.affiliation = author.AffiliationInfo.Affiliation
MERGE (p)<-[:WROTE]-(a)
)
遗憾的是,作者在数据库中没有唯一的 ID,因此可能不同的作者具有相同的姓氏但首字母缩写或隶属关系不同。
...
<Author ValidYN="Y">
<LastName>Smith</LastName>
<ForeName>A L</ForeName>
<Initials>AL</Initials>
<AffiliationInfo>
<Affiliation>University X</Affiliation>
</AffiliationInfo>
</Author>
...
<Author ValidYN="Y">
<LastName>Smith</LastName>
<ForeName>A L</ForeName>
<Initials>AL</Initials>
<AffiliationInfo>
<Affiliation>University BUMBABU</Affiliation>
</AffiliationInfo>
</Author>
我的意图是在 author.LastName 上进行 MERGE,但在 MATCH 上检查作者是否具有相同的 ForeName 或相同的隶属关系,如果没有则创建一个新节点。
我如何使用 CYPHER 查询来做到这一点?
编辑 1
Node Key 约束是解决方案,不过这是企业版的一项功能。正在寻找解决方法。
编辑 2
此代码几乎可以完美运行:
WITH $pubmed_dict as pubmed_article
UNWIND pubmed_article as particle
MERGE (p:Publication {pmid: particle.MedlineCitation.PMID.text})
ON CREATE SET p.title = COALESCE (particle.MedlineCitation.Article.Journal.Title, particle.MedlineCitation.Article.ArticleTitle)
ON MATCH SET p.title = COALESCE (particle.MedlineCitation.Article.Journal.Title, particle.MedlineCitation.Article.ArticleTitle)
FOREACH (author IN particle.MedlineCitation.Article.AuthorList.Author |
MERGE (a:Author {last_name: COALESCE(author.LastName, 'LAST NAME MISSING!'), first_name: COALESCE(author.ForeName, 'FIRST NAME MISSING!')})
MERGE (p)<-[:WROTE]-(a)
)
总结一下:
如果 LastName OR ForeName OR Affiliation 不同,我想为每位作者创建一个新作者。我还需要缺少姓氏的作者的新节点!名字不见了!
是否可以在没有关键节点约束的情况下实现这个结果? (因为这是企业版功能...)
您可以使用约束,然后 neo4j 会为您检查唯一性。
To create a Node Key ensuring that all nodes with a particular label have a set of defined properties whose combined value is unique, and where all properties in the set are present
CREATE CONSTRAINT ON (author:Author) ASSERT (author.first_name, author.last_name, author.affiliation) IS NODE KEY
作者在 Neo4j 中确实有一个唯一的 ID,即节点 ID。这可用于识别节点,然后设置属性。也许是这样的:
Match (a:Author{LastName:'xxx',ForeName:'yyy'})
with a, id(a) as ID
where ID > -1
match (b) where id(b)=ID set b.first_name = author.ForeName, b.affiliation = author.AffiliationInfo.Affiliation
该节点的ID不一定是稳定的或可预测的,因此您必须在使用前直接访问它。
因为您使用的是python代码,您最好使用全局查询来拉取作者节点数据:
match (a:Author{LastName:'xxx',ForeName:'yyy'}) return a.LastName,a.ForeName,id(a) as ID
然后,您可以编写一个csv文件来批量上传所需的信息。 csv 可能如下所示:
> "ID","ForeName","LastName","Affiliation"
"26","David","Smith","Johns Hopkins"
etc.
python代码可以过滤不需要处理的节点。
然后加载文件:
LOAD CVS with HEADER file:///'xxx.csv' as line
match (a) where id(a)=toInteger(line.ID)
set a.Affiliation=line.toString(line.Affiliation")
要将 XML 数据导入 neo4j 数据库,我首先将 XML 解析为 python 字典,然后使用 CYPHER 查询:
WITH $pubmed_dict as pubmed_article
UNWIND pubmed_article as particle
...
FOREACH (author IN particle.MedlineCitation.Article.AuthorList.Author |
MERGE (a:Author {last_name: COALESCE(author.LastName, 'LAST NAME MISSING!')})
ON CREATE SET a.first_name = author.ForeName, a.affiliation = author.AffiliationInfo.Affiliation
ON MATCH SET a.first_name = author.ForeName, a.affiliation = author.AffiliationInfo.Affiliation
MERGE (p)<-[:WROTE]-(a)
)
遗憾的是,作者在数据库中没有唯一的 ID,因此可能不同的作者具有相同的姓氏但首字母缩写或隶属关系不同。
...
<Author ValidYN="Y">
<LastName>Smith</LastName>
<ForeName>A L</ForeName>
<Initials>AL</Initials>
<AffiliationInfo>
<Affiliation>University X</Affiliation>
</AffiliationInfo>
</Author>
...
<Author ValidYN="Y">
<LastName>Smith</LastName>
<ForeName>A L</ForeName>
<Initials>AL</Initials>
<AffiliationInfo>
<Affiliation>University BUMBABU</Affiliation>
</AffiliationInfo>
</Author>
我的意图是在 author.LastName 上进行 MERGE,但在 MATCH 上检查作者是否具有相同的 ForeName 或相同的隶属关系,如果没有则创建一个新节点。
我如何使用 CYPHER 查询来做到这一点?
编辑 1
Node Key 约束是解决方案,不过这是企业版的一项功能。正在寻找解决方法。
编辑 2
此代码几乎可以完美运行:
WITH $pubmed_dict as pubmed_article
UNWIND pubmed_article as particle
MERGE (p:Publication {pmid: particle.MedlineCitation.PMID.text})
ON CREATE SET p.title = COALESCE (particle.MedlineCitation.Article.Journal.Title, particle.MedlineCitation.Article.ArticleTitle)
ON MATCH SET p.title = COALESCE (particle.MedlineCitation.Article.Journal.Title, particle.MedlineCitation.Article.ArticleTitle)
FOREACH (author IN particle.MedlineCitation.Article.AuthorList.Author |
MERGE (a:Author {last_name: COALESCE(author.LastName, 'LAST NAME MISSING!'), first_name: COALESCE(author.ForeName, 'FIRST NAME MISSING!')})
MERGE (p)<-[:WROTE]-(a)
)
总结一下: 如果 LastName OR ForeName OR Affiliation 不同,我想为每位作者创建一个新作者。我还需要缺少姓氏的作者的新节点!名字不见了!
是否可以在没有关键节点约束的情况下实现这个结果? (因为这是企业版功能...)
您可以使用约束,然后 neo4j 会为您检查唯一性。
To create a Node Key ensuring that all nodes with a particular label have a set of defined properties whose combined value is unique, and where all properties in the set are present
CREATE CONSTRAINT ON (author:Author) ASSERT (author.first_name, author.last_name, author.affiliation) IS NODE KEY
作者在 Neo4j 中确实有一个唯一的 ID,即节点 ID。这可用于识别节点,然后设置属性。也许是这样的:
Match (a:Author{LastName:'xxx',ForeName:'yyy'})
with a, id(a) as ID
where ID > -1
match (b) where id(b)=ID set b.first_name = author.ForeName, b.affiliation = author.AffiliationInfo.Affiliation
该节点的ID不一定是稳定的或可预测的,因此您必须在使用前直接访问它。
因为您使用的是python代码,您最好使用全局查询来拉取作者节点数据:
match (a:Author{LastName:'xxx',ForeName:'yyy'}) return a.LastName,a.ForeName,id(a) as ID
然后,您可以编写一个csv文件来批量上传所需的信息。 csv 可能如下所示:
> "ID","ForeName","LastName","Affiliation"
"26","David","Smith","Johns Hopkins"
etc.
python代码可以过滤不需要处理的节点。
然后加载文件:
LOAD CVS with HEADER file:///'xxx.csv' as line
match (a) where id(a)=toInteger(line.ID)
set a.Affiliation=line.toString(line.Affiliation")