优化或重组 n-n 关系 Neo4j 查询导致响应时间非常慢

Question

我正在寻找有关我的 Neo4j 图表之一的帮助。我的节点和关系看起来像这样

这是图中一个端到端关系的示例。

问题与具有多个（通知）的节点（商店）有关，并且它们与多个（通知操作类型）和（通知类型）有关

查询看起来像这样：

MATCH (s:app)-[:HAS_GEOZONE]->(g:geozone)-[:HAS_ENGAGEMENTZONE]->(e:engagement_zone)-[:HAS_STORE]->(st:store)-[:HAS_NOTIFICATION]->(nt:notification), shortestPath((nt)-[r:HAS_NOTIFICATION_TYPE]->(ntt:notification_type)), shortestPath((nt)-[ra:HAS_NOTIFICATION_ACTION_TYPE]->(ntta:notification_action_type)), (st)-[:HAS_ATTRIBUTE]->(sta:store_attribute) WHERE s.uuid={app_id} AND sta.key='name' OPTIONAL MATCH (nt)-[:HAS_BRAND]->(br:brand) OPTIONAL MATCH (nt)-[:HAS_LABEL]->(l:loyalty) RETURN nt.uuid as nt_id, COLLECT(DISTINCT st.uuid) as st_ids, COLLECT(DISTINCT sta.value) as store_names, COLLECT(DISTINCT properties(br)) as notification_brands, COLLECT(DISTINCT properties(l)) as notification_labels, COLLECT(DISTINCT properties(nt)) as notification, COLLECT(DISTINCT properties(ntt)) as notification_type, COLLECT(DISTINCT properties(ntta)) as notification_action_type ORDER BY nt_id

并且一次查询的响应时间超过 8 秒。我的应用程序最终会相当频繁地需要这些信息。这导致整体响应不佳和崩溃，这就是为什么在此期间我在两者之间引入了 redis 来缓存应用程序所需的一些数据。

例如来自原始查询的一个条目的响应是

而且，json 如下所示，其中列名是图中的节点

 [
  {
    "nt_id": "002a3ba0-2584-11ea-93de-118eb121a0f8",
    "st_ids": [
      "e5fb2cc0-2246-11ea-a327-c1a6ac2ca4a0"
    ],
    "store_names": [
      "AND"
    ],
    "notification_brands": [],
    "notification_labels": [],
    "notification": [
      {
        "sub_text": "Happy shopping!!",
        "action_url": "https://www.tatacliq.com/and/c-mbh11a00015",
        "image_url": "",
        "notification_match_type": "GENERAL",
        "validity_start": 1,
        "text": "Welcome to {{store_name}}",
        "inventory_request_params": "",
        "isActive": true,
        "validity_end": 1,
        "uuid": "002a3ba0-2584-11ea-93de-118eb121a0f8",
        "active_days": "{\"SUNDAY\":\"1100-2100\",\"MONDAY\":\"1100-2100\",\"TUESDAY\":\"1100-2100\",\"WEDNESDAY\":\"1100-2100\",\"THURSDAY\":\"1100-2100\",\"FRIDAY\":\"1100-2100\",\"SATURDAY\":\"1100-2100\"}"
      }
    ],
    "notification_type": [
      {
        "name": "Deals & Offers",
        "uuid": "2fdc2b20-4faf-11e9-bfff-47192e190163"
      }
    ],
    "notification_action_type": [
      {
        "name": "In Store",
        "uuid": "ce78fc50-4fae-11e9-b974-7995b4e2b93d"
      }
    ]
  }
]

neo4j 版本：neo4j:3.5.12-enterprise 和运行 in Docker on AWS m5.xlarge machine with 12G heap and cache size configured
您使用哪种 API / 驱动程序：在 ECS 上休息 API，在不同的实例上使用 Node JS
[简介或解释]

此外，附上 query.log，其中解释了实时环境中的执行时间表。

query.log query.log.1

非常感谢对此的任何帮助！

谢谢，阿纳布

Answer 1

Hej Arnab!

我对您在 MATCH 中使用 2 shortestPath 调用试图实现的目标感到有点困惑。您可能想使用 shortestPath 函数 return 您之前匹配的 2 个节点之间的最短路径。例如，它看起来像这样：

MATCH (a), (b), p = shortestPath((a)-[*]-(b))
RETURN p

您必须使用可变路径长度 (*) 让函数搜索从 a 到 b 的所有可能路径。

现在，如果我们从您的查询中取出 shortestPath 函数调用，并根据您编写查询脚本的方式，您应该会得到相同的结果，但希望更快：

MATCH (s:app)-[:HAS_GEOZONE]->(g:geozone)-[:HAS_ENGAGEMENTZONE]->(e:engagement_zone)-[:HAS_STORE]->(st:store)-[:HAS_NOTIFICATION]->(nt:notification), (nt)-[r:HAS_NOTIFICATION_TYPE]->(ntt:notification_type), (nt)-[ra:HAS_NOTIFICATION_ACTION_TYPE]->(ntta:notification_action_type), (st)-[:HAS_ATTRIBUTE]->(sta:store_attribute)
WHERE s.uuid={app_id} AND sta.key='name'
OPTIONAL MATCH (nt)-[:HAS_BRAND]->(br:brand)
OPTIONAL MATCH (nt)-[:HAS_LABEL]->(l:loyalty)
RETURN nt.uuid as nt_id, COLLECT(DISTINCT st.uuid) as st_ids, COLLECT(DISTINCT sta.value) as store_names, COLLECT(DISTINCT properties(br)) as notification_brands, COLLECT(DISTINCT properties(l)) as notification_labels, COLLECT(DISTINCT properties(nt)) as notification, COLLECT(DISTINCT properties(ntt)) as notification_type, COLLECT(DISTINCT properties(ntta)) as notification_action_type ORDER BY nt_id

您甚至可以尝试使用后续匹配来加快查询速度，而不是使用单个匹配：

MATCH (s:app)-[:HAS_GEOZONE]->(g:geozone)-[:HAS_ENGAGEMENTZONE]->(e:engagement_zone)-[:HAS_STORE]->(st:store)-[:HAS_NOTIFICATION]->(nt:notification)
WHERE s.uuid={app_id}
MATCH (nt)-[r:HAS_NOTIFICATION_TYPE]->(ntt:notification_type)
MATCH (nt)-[ra:HAS_NOTIFICATION_ACTION_TYPE]->(ntta:notification_action_type)
MATCH (st)-[:HAS_ATTRIBUTE]->(sta:store_attribute)
WHERE sta.key='name'
OPTIONAL MATCH (nt)-[:HAS_BRAND]->(br:brand)
OPTIONAL MATCH (nt)-[:HAS_LABEL]->(l:loyalty)
RETURN nt.uuid as nt_id, COLLECT(DISTINCT st.uuid) as st_ids, COLLECT(DISTINCT sta.value) as store_names, COLLECT(DISTINCT properties(br)) as notification_brands, COLLECT(DISTINCT properties(l)) as notification_labels, COLLECT(DISTINCT properties(nt)) as notification, COLLECT(DISTINCT properties(ntt)) as notification_type, COLLECT(DISTINCT properties(ntta)) as notification_action_type ORDER BY nt_id

但请注意，根据您的图形模型，这两个查询可能会产生不同的结果。如果您有兴趣了解这两种不同的图表匹配方式之间的区别，您可以阅读以下 post 中 Adam 的回答： https://community.neo4j.com/t/significance-of-using-in-neo4j-multiple-relationships-cypher-query/10800

希望对您有所帮助！ BR, 法比安

Answer 2

感谢您的回答。好吧，我现在已经设法解决了这个问题。我引入 shortestPath 的原因是我有你之前建议的查询。这是

MATCH (s:app)-[:HAS_GEOZONE]->(g:geozone)-[:HAS_ENGAGEMENTZONE]->(e:engagement_zone)-[:HAS_STORE]->(st:store)-[:HAS_NOTIFICATION]->(nt:notification), (nt)-[r:HAS_NOTIFICATION_TYPE]->(ntt:notification_type), (nt)-[ra:HAS_NOTIFICATION_ACTION_TYPE]->(ntta:notification_action_type), (st)-[:HAS_ATTRIBUTE]->(sta:store_attribute)
WHERE s.uuid={app_id} AND sta.key='name'
OPTIONAL MATCH (nt)-[:HAS_BRAND]->(br:brand)
OPTIONAL MATCH (nt)-[:HAS_LABEL]->(l:loyalty)
RETURN nt.uuid as nt_id, COLLECT(DISTINCT st.uuid) as st_ids, COLLECT(DISTINCT sta.value) as store_names, COLLECT(DISTINCT properties(br)) as notification_brands, COLLECT(DISTINCT properties(l)) as notification_labels, COLLECT(DISTINCT properties(nt)) as notification, COLLECT(DISTINCT properties(ntt)) as notification_type, COLLECT(DISTINCT properties(ntta)) as notification_action_type ORDER BY nt_id

如果在我的图表中有传入关系，即每个通知节点的 [r:HAS_NOTIFICATION_TYPE] 和 [ra:HAS_NOTIFICATION_ACTION_TYPE]，并且每个存储节点都有多个通知。当图表增长时，即当我有超过 1500 家商店和大约 200 条通知时，每条通知都具有上述传入关系。该查询将无法正常工作并挂起并使服务器崩溃。将它们更改为 shortestPath 确实阻止了这种情况的发生，但结果时间仍然很长。

现在，我将 Notif_Type 和 Notif_Action_Type 节点的 uuid 保存在通知节点的属性中。并像这样更新查询：

MATCH (n:business_entity)-[:HAS_APP]->(s:app)-[:HAS_GEOZONE]->(g:geozone)-[:HAS_ENGAGEMENTZONE]->(e:engagement_zone)-[:HAS_STORE]->(st:store)-[:HAS_NOTIFICATION]->(nt:notification), 
        (st)-[:HAS_ATTRIBUTE]->(sta:store_attribute)
        WHERE n.uuid={b_id} AND s.uuid={app_id} AND sta.key='name'
        OPTIONAL MATCH (nt)-[:HAS_BRAND]->(br:brand)
        OPTIONAL MATCH(ntt:notification_type) WHERE ntt.uuid = nt.notification_type
        OPTIONAL MATCH(ntta:notification_action_type) WHERE ntta.uuid = nt.notification_action_type
        RETURN nt.uuid as nt_id,  COLLECT(DISTINCT st.uuid) as st_ids, COLLECT(DISTINCT sta.value) as store_names,
        COLLECT(DISTINCT properties(br)) as notification_brands, COLLECT(DISTINCT properties(nt)) as notification, 
        COLLECT(DISTINCT properties(ntt)) as notification_type, COLLECT(DISTINCT properties(ntta)) as notification_action_type

而且，响应时间现在不到 500 毫秒。感谢您的详细解释！干杯！

优化或重组 n-n 关系 Neo4j 查询导致响应时间非常慢

Optimising or restructuring n-n relationship Neo4j queries causing very slow response times

optimization

neo4j

database-performance

graph-databases

cypher