优化或重组 n-n 关系 Neo4j 查询导致响应时间非常慢
Optimising or restructuring n-n relationship Neo4j queries causing very slow response times
我正在寻找有关我的 Neo4j 图表之一的帮助。我的节点和关系看起来像这样
这是图中一个端到端关系的示例。
问题与具有多个(通知)的节点(商店)有关,并且它们与多个(通知操作类型)和(通知类型)有关
查询看起来像这样:
MATCH (s:app)-[:HAS_GEOZONE]->(g:geozone)-[:HAS_ENGAGEMENTZONE]->(e:engagement_zone)-[:HAS_STORE]->(st:store)-[:HAS_NOTIFICATION]->(nt:notification), shortestPath((nt)-[r:HAS_NOTIFICATION_TYPE]->(ntt:notification_type)), shortestPath((nt)-[ra:HAS_NOTIFICATION_ACTION_TYPE]->(ntta:notification_action_type)), (st)-[:HAS_ATTRIBUTE]->(sta:store_attribute) WHERE s.uuid={app_id} AND sta.key='name' OPTIONAL MATCH (nt)-[:HAS_BRAND]->(br:brand) OPTIONAL MATCH (nt)-[:HAS_LABEL]->(l:loyalty) RETURN nt.uuid as nt_id, COLLECT(DISTINCT st.uuid) as st_ids, COLLECT(DISTINCT sta.value) as store_names, COLLECT(DISTINCT properties(br)) as notification_brands, COLLECT(DISTINCT properties(l)) as notification_labels, COLLECT(DISTINCT properties(nt)) as notification, COLLECT(DISTINCT properties(ntt)) as notification_type, COLLECT(DISTINCT properties(ntta)) as notification_action_type ORDER BY nt_id
并且一次查询的响应时间超过 8 秒。我的应用程序最终会相当频繁地需要这些信息。这导致整体响应不佳和崩溃,这就是为什么在此期间我在两者之间引入了 redis 来缓存应用程序所需的一些数据。
例如来自原始查询的一个条目的响应是
而且,json 如下所示,其中列名是图中的节点
[
{
"nt_id": "002a3ba0-2584-11ea-93de-118eb121a0f8",
"st_ids": [
"e5fb2cc0-2246-11ea-a327-c1a6ac2ca4a0"
],
"store_names": [
"AND"
],
"notification_brands": [],
"notification_labels": [],
"notification": [
{
"sub_text": "Happy shopping!!",
"action_url": "https://www.tatacliq.com/and/c-mbh11a00015",
"image_url": "",
"notification_match_type": "GENERAL",
"validity_start": 1,
"text": "Welcome to {{store_name}}",
"inventory_request_params": "",
"isActive": true,
"validity_end": 1,
"uuid": "002a3ba0-2584-11ea-93de-118eb121a0f8",
"active_days": "{\"SUNDAY\":\"1100-2100\",\"MONDAY\":\"1100-2100\",\"TUESDAY\":\"1100-2100\",\"WEDNESDAY\":\"1100-2100\",\"THURSDAY\":\"1100-2100\",\"FRIDAY\":\"1100-2100\",\"SATURDAY\":\"1100-2100\"}"
}
],
"notification_type": [
{
"name": "Deals & Offers",
"uuid": "2fdc2b20-4faf-11e9-bfff-47192e190163"
}
],
"notification_action_type": [
{
"name": "In Store",
"uuid": "ce78fc50-4fae-11e9-b974-7995b4e2b93d"
}
]
}
]
neo4j 版本:neo4j:3.5.12-enterprise 和 运行 in Docker on AWS m5.xlarge machine with 12G heap and cache size configured
您使用哪种 API / 驱动程序:在 ECS 上休息 API,在不同的实例上使用 Node JS
[简介或解释]
的屏幕截图
此外,附上 query.log,其中解释了实时环境中的执行时间表。
非常感谢对此的任何帮助!
谢谢,
阿纳布
Hej Arnab!
我对您在 MATCH 中使用 2 shortestPath 调用试图实现的目标感到有点困惑。
您可能想使用 shortestPath 函数 return 您之前匹配的 2 个节点之间的最短路径。例如,它看起来像这样:
MATCH (a), (b), p = shortestPath((a)-[*]-(b))
RETURN p
您必须使用可变路径长度 (*) 让函数搜索从 a 到 b 的所有可能路径。
现在,如果我们从您的查询中取出 shortestPath 函数调用,并根据您编写查询脚本的方式,您应该会得到相同的结果,但希望更快:
MATCH (s:app)-[:HAS_GEOZONE]->(g:geozone)-[:HAS_ENGAGEMENTZONE]->(e:engagement_zone)-[:HAS_STORE]->(st:store)-[:HAS_NOTIFICATION]->(nt:notification), (nt)-[r:HAS_NOTIFICATION_TYPE]->(ntt:notification_type), (nt)-[ra:HAS_NOTIFICATION_ACTION_TYPE]->(ntta:notification_action_type), (st)-[:HAS_ATTRIBUTE]->(sta:store_attribute)
WHERE s.uuid={app_id} AND sta.key='name'
OPTIONAL MATCH (nt)-[:HAS_BRAND]->(br:brand)
OPTIONAL MATCH (nt)-[:HAS_LABEL]->(l:loyalty)
RETURN nt.uuid as nt_id, COLLECT(DISTINCT st.uuid) as st_ids, COLLECT(DISTINCT sta.value) as store_names, COLLECT(DISTINCT properties(br)) as notification_brands, COLLECT(DISTINCT properties(l)) as notification_labels, COLLECT(DISTINCT properties(nt)) as notification, COLLECT(DISTINCT properties(ntt)) as notification_type, COLLECT(DISTINCT properties(ntta)) as notification_action_type ORDER BY nt_id
您甚至可以尝试使用后续匹配来加快查询速度,而不是使用单个匹配:
MATCH (s:app)-[:HAS_GEOZONE]->(g:geozone)-[:HAS_ENGAGEMENTZONE]->(e:engagement_zone)-[:HAS_STORE]->(st:store)-[:HAS_NOTIFICATION]->(nt:notification)
WHERE s.uuid={app_id}
MATCH (nt)-[r:HAS_NOTIFICATION_TYPE]->(ntt:notification_type)
MATCH (nt)-[ra:HAS_NOTIFICATION_ACTION_TYPE]->(ntta:notification_action_type)
MATCH (st)-[:HAS_ATTRIBUTE]->(sta:store_attribute)
WHERE sta.key='name'
OPTIONAL MATCH (nt)-[:HAS_BRAND]->(br:brand)
OPTIONAL MATCH (nt)-[:HAS_LABEL]->(l:loyalty)
RETURN nt.uuid as nt_id, COLLECT(DISTINCT st.uuid) as st_ids, COLLECT(DISTINCT sta.value) as store_names, COLLECT(DISTINCT properties(br)) as notification_brands, COLLECT(DISTINCT properties(l)) as notification_labels, COLLECT(DISTINCT properties(nt)) as notification, COLLECT(DISTINCT properties(ntt)) as notification_type, COLLECT(DISTINCT properties(ntta)) as notification_action_type ORDER BY nt_id
但请注意,根据您的图形模型,这两个查询可能会产生不同的结果。如果您有兴趣了解这两种不同的图表匹配方式之间的区别,您可以阅读以下 post 中 Adam 的回答:
https://community.neo4j.com/t/significance-of-using-in-neo4j-multiple-relationships-cypher-query/10800
希望对您有所帮助!
BR,
法比安
感谢您的回答。好吧,我现在已经设法解决了这个问题。我引入 shortestPath 的原因是我有你之前建议的查询。这是
MATCH (s:app)-[:HAS_GEOZONE]->(g:geozone)-[:HAS_ENGAGEMENTZONE]->(e:engagement_zone)-[:HAS_STORE]->(st:store)-[:HAS_NOTIFICATION]->(nt:notification), (nt)-[r:HAS_NOTIFICATION_TYPE]->(ntt:notification_type), (nt)-[ra:HAS_NOTIFICATION_ACTION_TYPE]->(ntta:notification_action_type), (st)-[:HAS_ATTRIBUTE]->(sta:store_attribute)
WHERE s.uuid={app_id} AND sta.key='name'
OPTIONAL MATCH (nt)-[:HAS_BRAND]->(br:brand)
OPTIONAL MATCH (nt)-[:HAS_LABEL]->(l:loyalty)
RETURN nt.uuid as nt_id, COLLECT(DISTINCT st.uuid) as st_ids, COLLECT(DISTINCT sta.value) as store_names, COLLECT(DISTINCT properties(br)) as notification_brands, COLLECT(DISTINCT properties(l)) as notification_labels, COLLECT(DISTINCT properties(nt)) as notification, COLLECT(DISTINCT properties(ntt)) as notification_type, COLLECT(DISTINCT properties(ntta)) as notification_action_type ORDER BY nt_id
如果在我的图表中有传入关系,即每个通知节点的 [r:HAS_NOTIFICATION_TYPE]
和 [ra:HAS_NOTIFICATION_ACTION_TYPE]
,并且每个存储节点都有多个通知。当图表增长时,即当我有超过 1500 家商店和大约 200 条通知时,每条通知都具有上述传入关系。该查询将无法正常工作并挂起并使服务器崩溃。将它们更改为 shortestPath 确实阻止了这种情况的发生,但结果时间仍然很长。
现在,我将 Notif_Type 和 Notif_Action_Type 节点的 uuid 保存在通知节点的属性中。并像这样更新查询:
MATCH (n:business_entity)-[:HAS_APP]->(s:app)-[:HAS_GEOZONE]->(g:geozone)-[:HAS_ENGAGEMENTZONE]->(e:engagement_zone)-[:HAS_STORE]->(st:store)-[:HAS_NOTIFICATION]->(nt:notification),
(st)-[:HAS_ATTRIBUTE]->(sta:store_attribute)
WHERE n.uuid={b_id} AND s.uuid={app_id} AND sta.key='name'
OPTIONAL MATCH (nt)-[:HAS_BRAND]->(br:brand)
OPTIONAL MATCH(ntt:notification_type) WHERE ntt.uuid = nt.notification_type
OPTIONAL MATCH(ntta:notification_action_type) WHERE ntta.uuid = nt.notification_action_type
RETURN nt.uuid as nt_id, COLLECT(DISTINCT st.uuid) as st_ids, COLLECT(DISTINCT sta.value) as store_names,
COLLECT(DISTINCT properties(br)) as notification_brands, COLLECT(DISTINCT properties(nt)) as notification,
COLLECT(DISTINCT properties(ntt)) as notification_type, COLLECT(DISTINCT properties(ntta)) as notification_action_type
而且,响应时间现在不到 500 毫秒。感谢您的详细解释!
干杯!
我正在寻找有关我的 Neo4j 图表之一的帮助。我的节点和关系看起来像这样
这是图中一个端到端关系的示例。
问题与具有多个(通知)的节点(商店)有关,并且它们与多个(通知操作类型)和(通知类型)有关
查询看起来像这样:
MATCH (s:app)-[:HAS_GEOZONE]->(g:geozone)-[:HAS_ENGAGEMENTZONE]->(e:engagement_zone)-[:HAS_STORE]->(st:store)-[:HAS_NOTIFICATION]->(nt:notification), shortestPath((nt)-[r:HAS_NOTIFICATION_TYPE]->(ntt:notification_type)), shortestPath((nt)-[ra:HAS_NOTIFICATION_ACTION_TYPE]->(ntta:notification_action_type)), (st)-[:HAS_ATTRIBUTE]->(sta:store_attribute) WHERE s.uuid={app_id} AND sta.key='name' OPTIONAL MATCH (nt)-[:HAS_BRAND]->(br:brand) OPTIONAL MATCH (nt)-[:HAS_LABEL]->(l:loyalty) RETURN nt.uuid as nt_id, COLLECT(DISTINCT st.uuid) as st_ids, COLLECT(DISTINCT sta.value) as store_names, COLLECT(DISTINCT properties(br)) as notification_brands, COLLECT(DISTINCT properties(l)) as notification_labels, COLLECT(DISTINCT properties(nt)) as notification, COLLECT(DISTINCT properties(ntt)) as notification_type, COLLECT(DISTINCT properties(ntta)) as notification_action_type ORDER BY nt_id
并且一次查询的响应时间超过 8 秒。我的应用程序最终会相当频繁地需要这些信息。这导致整体响应不佳和崩溃,这就是为什么在此期间我在两者之间引入了 redis 来缓存应用程序所需的一些数据。
例如来自原始查询的一个条目的响应是
而且,json 如下所示,其中列名是图中的节点
[
{
"nt_id": "002a3ba0-2584-11ea-93de-118eb121a0f8",
"st_ids": [
"e5fb2cc0-2246-11ea-a327-c1a6ac2ca4a0"
],
"store_names": [
"AND"
],
"notification_brands": [],
"notification_labels": [],
"notification": [
{
"sub_text": "Happy shopping!!",
"action_url": "https://www.tatacliq.com/and/c-mbh11a00015",
"image_url": "",
"notification_match_type": "GENERAL",
"validity_start": 1,
"text": "Welcome to {{store_name}}",
"inventory_request_params": "",
"isActive": true,
"validity_end": 1,
"uuid": "002a3ba0-2584-11ea-93de-118eb121a0f8",
"active_days": "{\"SUNDAY\":\"1100-2100\",\"MONDAY\":\"1100-2100\",\"TUESDAY\":\"1100-2100\",\"WEDNESDAY\":\"1100-2100\",\"THURSDAY\":\"1100-2100\",\"FRIDAY\":\"1100-2100\",\"SATURDAY\":\"1100-2100\"}"
}
],
"notification_type": [
{
"name": "Deals & Offers",
"uuid": "2fdc2b20-4faf-11e9-bfff-47192e190163"
}
],
"notification_action_type": [
{
"name": "In Store",
"uuid": "ce78fc50-4fae-11e9-b974-7995b4e2b93d"
}
]
}
]
neo4j 版本:neo4j:3.5.12-enterprise 和 运行 in Docker on AWS m5.xlarge machine with 12G heap and cache size configured
您使用哪种 API / 驱动程序:在 ECS 上休息 API,在不同的实例上使用 Node JS
[简介或解释]
的屏幕截图
此外,附上 query.log,其中解释了实时环境中的执行时间表。
非常感谢对此的任何帮助!
谢谢, 阿纳布
Hej Arnab!
我对您在 MATCH 中使用 2 shortestPath 调用试图实现的目标感到有点困惑。 您可能想使用 shortestPath 函数 return 您之前匹配的 2 个节点之间的最短路径。例如,它看起来像这样:
MATCH (a), (b), p = shortestPath((a)-[*]-(b))
RETURN p
您必须使用可变路径长度 (*) 让函数搜索从 a 到 b 的所有可能路径。
现在,如果我们从您的查询中取出 shortestPath 函数调用,并根据您编写查询脚本的方式,您应该会得到相同的结果,但希望更快:
MATCH (s:app)-[:HAS_GEOZONE]->(g:geozone)-[:HAS_ENGAGEMENTZONE]->(e:engagement_zone)-[:HAS_STORE]->(st:store)-[:HAS_NOTIFICATION]->(nt:notification), (nt)-[r:HAS_NOTIFICATION_TYPE]->(ntt:notification_type), (nt)-[ra:HAS_NOTIFICATION_ACTION_TYPE]->(ntta:notification_action_type), (st)-[:HAS_ATTRIBUTE]->(sta:store_attribute)
WHERE s.uuid={app_id} AND sta.key='name'
OPTIONAL MATCH (nt)-[:HAS_BRAND]->(br:brand)
OPTIONAL MATCH (nt)-[:HAS_LABEL]->(l:loyalty)
RETURN nt.uuid as nt_id, COLLECT(DISTINCT st.uuid) as st_ids, COLLECT(DISTINCT sta.value) as store_names, COLLECT(DISTINCT properties(br)) as notification_brands, COLLECT(DISTINCT properties(l)) as notification_labels, COLLECT(DISTINCT properties(nt)) as notification, COLLECT(DISTINCT properties(ntt)) as notification_type, COLLECT(DISTINCT properties(ntta)) as notification_action_type ORDER BY nt_id
您甚至可以尝试使用后续匹配来加快查询速度,而不是使用单个匹配:
MATCH (s:app)-[:HAS_GEOZONE]->(g:geozone)-[:HAS_ENGAGEMENTZONE]->(e:engagement_zone)-[:HAS_STORE]->(st:store)-[:HAS_NOTIFICATION]->(nt:notification)
WHERE s.uuid={app_id}
MATCH (nt)-[r:HAS_NOTIFICATION_TYPE]->(ntt:notification_type)
MATCH (nt)-[ra:HAS_NOTIFICATION_ACTION_TYPE]->(ntta:notification_action_type)
MATCH (st)-[:HAS_ATTRIBUTE]->(sta:store_attribute)
WHERE sta.key='name'
OPTIONAL MATCH (nt)-[:HAS_BRAND]->(br:brand)
OPTIONAL MATCH (nt)-[:HAS_LABEL]->(l:loyalty)
RETURN nt.uuid as nt_id, COLLECT(DISTINCT st.uuid) as st_ids, COLLECT(DISTINCT sta.value) as store_names, COLLECT(DISTINCT properties(br)) as notification_brands, COLLECT(DISTINCT properties(l)) as notification_labels, COLLECT(DISTINCT properties(nt)) as notification, COLLECT(DISTINCT properties(ntt)) as notification_type, COLLECT(DISTINCT properties(ntta)) as notification_action_type ORDER BY nt_id
但请注意,根据您的图形模型,这两个查询可能会产生不同的结果。如果您有兴趣了解这两种不同的图表匹配方式之间的区别,您可以阅读以下 post 中 Adam 的回答: https://community.neo4j.com/t/significance-of-using-in-neo4j-multiple-relationships-cypher-query/10800
希望对您有所帮助! BR, 法比安
感谢您的回答。好吧,我现在已经设法解决了这个问题。我引入 shortestPath 的原因是我有你之前建议的查询。这是
MATCH (s:app)-[:HAS_GEOZONE]->(g:geozone)-[:HAS_ENGAGEMENTZONE]->(e:engagement_zone)-[:HAS_STORE]->(st:store)-[:HAS_NOTIFICATION]->(nt:notification), (nt)-[r:HAS_NOTIFICATION_TYPE]->(ntt:notification_type), (nt)-[ra:HAS_NOTIFICATION_ACTION_TYPE]->(ntta:notification_action_type), (st)-[:HAS_ATTRIBUTE]->(sta:store_attribute)
WHERE s.uuid={app_id} AND sta.key='name'
OPTIONAL MATCH (nt)-[:HAS_BRAND]->(br:brand)
OPTIONAL MATCH (nt)-[:HAS_LABEL]->(l:loyalty)
RETURN nt.uuid as nt_id, COLLECT(DISTINCT st.uuid) as st_ids, COLLECT(DISTINCT sta.value) as store_names, COLLECT(DISTINCT properties(br)) as notification_brands, COLLECT(DISTINCT properties(l)) as notification_labels, COLLECT(DISTINCT properties(nt)) as notification, COLLECT(DISTINCT properties(ntt)) as notification_type, COLLECT(DISTINCT properties(ntta)) as notification_action_type ORDER BY nt_id
如果在我的图表中有传入关系,即每个通知节点的 [r:HAS_NOTIFICATION_TYPE]
和 [ra:HAS_NOTIFICATION_ACTION_TYPE]
,并且每个存储节点都有多个通知。当图表增长时,即当我有超过 1500 家商店和大约 200 条通知时,每条通知都具有上述传入关系。该查询将无法正常工作并挂起并使服务器崩溃。将它们更改为 shortestPath 确实阻止了这种情况的发生,但结果时间仍然很长。
现在,我将 Notif_Type 和 Notif_Action_Type 节点的 uuid 保存在通知节点的属性中。并像这样更新查询:
MATCH (n:business_entity)-[:HAS_APP]->(s:app)-[:HAS_GEOZONE]->(g:geozone)-[:HAS_ENGAGEMENTZONE]->(e:engagement_zone)-[:HAS_STORE]->(st:store)-[:HAS_NOTIFICATION]->(nt:notification),
(st)-[:HAS_ATTRIBUTE]->(sta:store_attribute)
WHERE n.uuid={b_id} AND s.uuid={app_id} AND sta.key='name'
OPTIONAL MATCH (nt)-[:HAS_BRAND]->(br:brand)
OPTIONAL MATCH(ntt:notification_type) WHERE ntt.uuid = nt.notification_type
OPTIONAL MATCH(ntta:notification_action_type) WHERE ntta.uuid = nt.notification_action_type
RETURN nt.uuid as nt_id, COLLECT(DISTINCT st.uuid) as st_ids, COLLECT(DISTINCT sta.value) as store_names,
COLLECT(DISTINCT properties(br)) as notification_brands, COLLECT(DISTINCT properties(nt)) as notification,
COLLECT(DISTINCT properties(ntt)) as notification_type, COLLECT(DISTINCT properties(ntta)) as notification_action_type
而且,响应时间现在不到 500 毫秒。感谢您的详细解释! 干杯!