Neo4J 基于时间的数据查询显示出比预期更多的关系
Time based data query with Neo4J showing more relations than expected
我正在努力了解 Neo4J 和基于时间的数据。
所以我基本上想构建的是一个数据结构,它能够在某个时间给我一个跟踪节点(页面视图)及其引用者和引用者-引用者。
我的问题是,如果我保存数据及其与时间树的关系,仍然会出现在按小时查询特定时间时不应该显示的关系。
在研究过程中,我发现了这篇关于 modeling time series data with neo4j 的文章。
到目前为止一切顺利,但引荐来源网址及其子关系并未按时间抽象化。
为了更好地说明问题,这里先介绍数据结构:
我创建了一个索引:
CREATE INDEX ON :Year(value);
CREATE INDEX ON :Month(value);
CREATE INDEX ON :Day(value);
CREATE INDEX ON :Hour(value);
CREATE INDEX ON :Minute(value);
CREATE INDEX ON :Second(value);
并把时间节点放在那里:
//Create Time Tree with Day Depth
WITH range(2015, 2017) AS years, range(1,12) AS months
FOREACH(year IN years |
CREATE (y:Year {value: year})
FOREACH(month IN months |
CREATE (m:Month {value: month})
MERGE (y)-[:CONTAINS]->(m)
FOREACH(day IN (CASE
WHEN month IN [1,3,5,7,8,10,12] THEN range(1,31)
WHEN month = 2 THEN
CASE
WHEN year % 4 <> 0 THEN range(1,28)
WHEN year % 100 = 0 AND year % 400 = 0 THEN range(1,29)
ELSE range(1,28)
END
ELSE range(1,30)
END) |
CREATE (d:Day {value: day})
MERGE (m)-[:CONTAINS]->(d))))
如果我现在保存数据:
MERGE (a:tracking {ip:'someniceid', type:'page_view', timestamp:'2154645'})
MERGE (f:Domain {name:'domain1.com'})
MERGE (e:Domain {name:'domain2.com'})
MERGE (d:Domain {name:'domain3.com'})
MERGE (z:Domain {name:'domain4.com'})
MERGE (a)-[:CAME_FROM]->(f)
MERGE (f)-[:REFERRED_BY]->(e)
MERGE (e)-[:REFERRED_BY]->(d)
MERGE (d)-[:REFERRED_BY]->(z)
WITH a, 2016 AS y
MATCH (year:Year {value: y})
WITH a, year, 5 AS m
MATCH (year)-[:CONTAINS]->(month:Month {value: m})
WITH a, month, 9 AS d
MATCH (month)-[:CONTAINS]->(day:Day {value: d})
WITH a, day, 14 AS h
MERGE (day)-[:CONTAINS]->(hour:Hour {value: h})
MERGE (a)-[:HAPPENED_ON]->(hour)
我通过查询得到以下图表:
MATCH (y)-[:CONTAINS]->(m:Month {value: 5}) WITH y, m
MATCH (m)-[:CONTAINS]->(d {value: 9}) WITH y, m, d
MATCH (d)-[:CONTAINS]->(h {value: 14}) WITH y, m, d, h
MATCH (a:tracking)-[:HAPPENED_ON]->(h),(a)-[:CAME_FROM|:REFERRED_BY*]->(dom) RETURN dom AS D, a AS A
当我现在再保存一个数据集时,唯一的区别是更改了小时和域(而不是 domain4,我们现在有了 domain6),例如:
MERGE (a:tracking {ip:'someniceid', type:'page_view', timestamp:'2154645'})"
MERGE (f:Domain {name:'domain1.com'})
MERGE (e:Domain {name:'domain2.com'})
MERGE (d:Domain {name:'domain3.com'})
MERGE (z:Domain {name:'domain6.com'})
MERGE (a)-[:CAME_FROM]->(f)
MERGE (f)-[:REFERRED_BY]->(e)
MERGE (e)-[:REFERRED_BY]->(d)
MERGE (d)-[:REFERRED_BY]->(z)
WITH a, 2016 AS y
MATCH (year:Year {value: y})
WITH a, year, 5 AS m
MATCH (year)-[:CONTAINS]->(month:Month {value: m})
WITH a, month, 9 AS d
MATCH (month)-[:CONTAINS]->(day:Day {value: d})
WITH a, day, 10 AS h
MERGE (day)-[:CONTAINS]->(hour:Hour {value: h})
MERGE (a)-[:HAPPENED_ON]->(hour)
因此,对于上面的相同查询,又添加了一个引荐来源网址,我认为这应该不会发生,因为与跟踪节点相关的时间(小时)节点不同:
尽管跟踪连接到不同的小时节点,但仍显示推荐人关系!我做错了什么?对我来说,域6不应该是可见的,因为相关的跟踪与那个时间节点没有联系...有人有想法吗?
问题是对于每个被监视的事件 merge
没有为域创建新记录,并且您存储了不正确的域序列。尝试为每个跟踪创建指向域的链接:
MERGE (a:tracking {ip:'someniceid', type:'page_view', timestamp:'2154645'})
MERGE (_f:Domain {name:'domain1.com'})
MERGE (_e:Domain {name:'domain2.com'})
MERGE (_d:Domain {name:'domain3.com'})
MERGE (_z:Domain {name:'domain4.com'})
CREATE (f:Symlink)-[:Symlink]->(_f)
CREATE (e:Symlink)-[:Symlink]->(_e)
CREATE (d:Symlink)-[:Symlink]->(_d)
CREATE (z:Symlink)-[:Symlink]->(_z)
MERGE (a)-[:CAME_FROM]->(f)
MERGE (f)-[:REFERRED_BY]->(e)
MERGE (e)-[:REFERRED_BY]->(d)
MERGE (d)-[:REFERRED_BY]->(z)
WITH a, 2016 AS y
MATCH (year:Year {value: y})
WITH a, year, 5 AS m
MATCH (year)-[:CONTAINS]->(month:Month {value: m})
WITH a, month, 9 AS d
MATCH (month)-[:CONTAINS]->(day:Day {value: d})
WITH a, day, 14 AS h
MERGE (day)-[:CONTAINS]->(hour:Hour {value: h})
MERGE (a)-[:HAPPENED_ON]->(hour)
我正在努力了解 Neo4J 和基于时间的数据。
所以我基本上想构建的是一个数据结构,它能够在某个时间给我一个跟踪节点(页面视图)及其引用者和引用者-引用者。
我的问题是,如果我保存数据及其与时间树的关系,仍然会出现在按小时查询特定时间时不应该显示的关系。
在研究过程中,我发现了这篇关于 modeling time series data with neo4j 的文章。
到目前为止一切顺利,但引荐来源网址及其子关系并未按时间抽象化。
为了更好地说明问题,这里先介绍数据结构:
我创建了一个索引:
CREATE INDEX ON :Year(value);
CREATE INDEX ON :Month(value);
CREATE INDEX ON :Day(value);
CREATE INDEX ON :Hour(value);
CREATE INDEX ON :Minute(value);
CREATE INDEX ON :Second(value);
并把时间节点放在那里:
//Create Time Tree with Day Depth
WITH range(2015, 2017) AS years, range(1,12) AS months
FOREACH(year IN years |
CREATE (y:Year {value: year})
FOREACH(month IN months |
CREATE (m:Month {value: month})
MERGE (y)-[:CONTAINS]->(m)
FOREACH(day IN (CASE
WHEN month IN [1,3,5,7,8,10,12] THEN range(1,31)
WHEN month = 2 THEN
CASE
WHEN year % 4 <> 0 THEN range(1,28)
WHEN year % 100 = 0 AND year % 400 = 0 THEN range(1,29)
ELSE range(1,28)
END
ELSE range(1,30)
END) |
CREATE (d:Day {value: day})
MERGE (m)-[:CONTAINS]->(d))))
如果我现在保存数据:
MERGE (a:tracking {ip:'someniceid', type:'page_view', timestamp:'2154645'})
MERGE (f:Domain {name:'domain1.com'})
MERGE (e:Domain {name:'domain2.com'})
MERGE (d:Domain {name:'domain3.com'})
MERGE (z:Domain {name:'domain4.com'})
MERGE (a)-[:CAME_FROM]->(f)
MERGE (f)-[:REFERRED_BY]->(e)
MERGE (e)-[:REFERRED_BY]->(d)
MERGE (d)-[:REFERRED_BY]->(z)
WITH a, 2016 AS y
MATCH (year:Year {value: y})
WITH a, year, 5 AS m
MATCH (year)-[:CONTAINS]->(month:Month {value: m})
WITH a, month, 9 AS d
MATCH (month)-[:CONTAINS]->(day:Day {value: d})
WITH a, day, 14 AS h
MERGE (day)-[:CONTAINS]->(hour:Hour {value: h})
MERGE (a)-[:HAPPENED_ON]->(hour)
我通过查询得到以下图表:
MATCH (y)-[:CONTAINS]->(m:Month {value: 5}) WITH y, m
MATCH (m)-[:CONTAINS]->(d {value: 9}) WITH y, m, d
MATCH (d)-[:CONTAINS]->(h {value: 14}) WITH y, m, d, h
MATCH (a:tracking)-[:HAPPENED_ON]->(h),(a)-[:CAME_FROM|:REFERRED_BY*]->(dom) RETURN dom AS D, a AS A
当我现在再保存一个数据集时,唯一的区别是更改了小时和域(而不是 domain4,我们现在有了 domain6),例如:
MERGE (a:tracking {ip:'someniceid', type:'page_view', timestamp:'2154645'})"
MERGE (f:Domain {name:'domain1.com'})
MERGE (e:Domain {name:'domain2.com'})
MERGE (d:Domain {name:'domain3.com'})
MERGE (z:Domain {name:'domain6.com'})
MERGE (a)-[:CAME_FROM]->(f)
MERGE (f)-[:REFERRED_BY]->(e)
MERGE (e)-[:REFERRED_BY]->(d)
MERGE (d)-[:REFERRED_BY]->(z)
WITH a, 2016 AS y
MATCH (year:Year {value: y})
WITH a, year, 5 AS m
MATCH (year)-[:CONTAINS]->(month:Month {value: m})
WITH a, month, 9 AS d
MATCH (month)-[:CONTAINS]->(day:Day {value: d})
WITH a, day, 10 AS h
MERGE (day)-[:CONTAINS]->(hour:Hour {value: h})
MERGE (a)-[:HAPPENED_ON]->(hour)
因此,对于上面的相同查询,又添加了一个引荐来源网址,我认为这应该不会发生,因为与跟踪节点相关的时间(小时)节点不同:
尽管跟踪连接到不同的小时节点,但仍显示推荐人关系!我做错了什么?对我来说,域6不应该是可见的,因为相关的跟踪与那个时间节点没有联系...有人有想法吗?
问题是对于每个被监视的事件 merge
没有为域创建新记录,并且您存储了不正确的域序列。尝试为每个跟踪创建指向域的链接:
MERGE (a:tracking {ip:'someniceid', type:'page_view', timestamp:'2154645'})
MERGE (_f:Domain {name:'domain1.com'})
MERGE (_e:Domain {name:'domain2.com'})
MERGE (_d:Domain {name:'domain3.com'})
MERGE (_z:Domain {name:'domain4.com'})
CREATE (f:Symlink)-[:Symlink]->(_f)
CREATE (e:Symlink)-[:Symlink]->(_e)
CREATE (d:Symlink)-[:Symlink]->(_d)
CREATE (z:Symlink)-[:Symlink]->(_z)
MERGE (a)-[:CAME_FROM]->(f)
MERGE (f)-[:REFERRED_BY]->(e)
MERGE (e)-[:REFERRED_BY]->(d)
MERGE (d)-[:REFERRED_BY]->(z)
WITH a, 2016 AS y
MATCH (year:Year {value: y})
WITH a, year, 5 AS m
MATCH (year)-[:CONTAINS]->(month:Month {value: m})
WITH a, month, 9 AS d
MATCH (month)-[:CONTAINS]->(day:Day {value: d})
WITH a, day, 14 AS h
MERGE (day)-[:CONTAINS]->(hour:Hour {value: h})
MERGE (a)-[:HAPPENED_ON]->(hour)