如何将 ArangoDb 中的结果分组为单个记录?
How to group results in ArangoDb into single record?
我有特定类型的事件列表,结构如下:
{
createdAt: 123123132,
type: STARTED,
metadata: {
emailAddress: "foo@bar.com"
}
}
类型的数量是预定义的(START
、STOP
、REMOVE
...)。用户在一段时间内产生一个或多个事件。
我想获得以下聚合:
For each user, calculate the number of events for each type.
我的 AQL 查询如下所示:
FOR event IN events
COLLECT
email = event.metadata.emailAddress,
type = event.type WITH COUNT INTO count
LIMIT 10
RETURN {
email,
t: {type, count}
}
这会产生以下输出:
{ email: '_84@example.com', t: { type: 'CREATE', count: 203 } }
{ email: '_84@example.com', t: { type: 'DEPLOY', count: 214 } }
{ email: '_84@example.com', t: { type: 'REMOVE', count: 172 } }
{ email: '_84@example.com', t: { type: 'START', count: 204 } }
{ email: '_84@example.com', t: { type: 'STOP', count: 187 } }
{ email: '_95@example.com', t: { type: 'CREATE', count: 189 } }
{ email: '_95@example.com', t: { type: 'DEPLOY', count: 173 } }
{ email: '_95@example.com', t: { type: 'REMOVE', count: 194 } }
{ email: '_95@example.com', t: { type: 'START', count: 213 } }
{ email: '_95@example.com', t: { type: 'STOP', count: 208 } }
...
即每种类型我都有一行。但我想要这样的结果:
{ email: foo@bar.com, count1: 203, count2: 214, count3: 172 ...}
{ email: aaa@fff.com, count1: 189, count2: 173, count3: 194 ...}
...
或
{ email: foo@bar.com, CREATE: 203, DEPLOY: 214, ... }
...
即再次对结果进行分组。
我还需要按计数对结果(而不是事件)进行排序:到 return 例如事件数量最多的前 10 位用户。
怎么做?
一个解决方案
这里有一个解决方案,请查看已接受的答案以了解更多信息。
FOR a in (FOR event IN events
COLLECT
emailAddress = event.metadata.emailAddress,
type = event.type WITH COUNT INTO count
COLLECT email = emailAddress INTO perUser KEEP type, count
RETURN MERGE(PUSH(perUser[* RETURN {[LOWER(CURRENT.type)]: CURRENT.count}], {email})))
SORT a.create desc
LIMIT 10
RETURN a
您可以按用户和事件类型分组,然后再次按用户分组,只保留类型和已计算的事件类型计数。在第二次聚合中,重要的是要知道事件属于哪些组以构建结果。 array inline projection 可用于保持查询简短:
FOR event IN events
COLLECT
emailAddress = event.metadata.emailAddress,
type = event.type WITH COUNT INTO count
COLLECT email = emailAddress INTO perUser KEEP type, count
RETURN MERGE(PUSH(perUser[* RETURN {[CURRENT.type]: CURRENT.count}], {email}))
另一种方法是按用户分组并保留事件类型,然后在子查询中对类型进行分组。但它在我的测试中明显较慢(至少没有定义任何索引):
FOR event IN events
LET type = event.type
COLLECT
email = event.metadata.emailAddress INTO groups KEEP type
LET byType = (
FOR t IN groups[*].type
COLLECT t2 = t WITH COUNT INTO count
RETURN {[t2]: count}
)
RETURN MERGE(PUSH(byType, {email}))
Return获取 CREATE 事件最多的前 10 名用户要简单得多。过滤 CREATE 事件类型,然后按用户分组并计算事件数,按此数字降序排序 return 前 10 个结果:
FOR event IN events
FILTER event.type == "CREATE"
COLLECT email = event.metadata.emailAddress WITH COUNT INTO count
SORT count DESC
LIMIT 10
RETURN {email, count}
EDIT1:Return 每个用户一个文档,事件类型已分组和计数(如在第一个查询中),但捕获 MERGE 结果,按计数排序一种特定事件类型(此处:CREATE)和 return 这种类型的前 10 名用户。结果与问题中给出的解决方案相同。但是,它保留了子查询 FOR a IN (FOR event IN events ...) ... RETURN a
:
FOR event IN events
COLLECT
emailAddress = event.metadata.emailAddress,
type = event.type WITH COUNT INTO count
COLLECT email = emailAddress INTO perUser KEEP type, count
LET ret = MERGE(PUSH(perUser[* RETURN {[CURRENT.type]: CURRENT.count}], {email}))
SORT ret.CREATE DESC
LIMIT 10
RETURN ret
EDIT2:查询生成示例数据(需要集合events
存在):
FOR i IN 1..100
LET email = CONCAT(RANDOM_TOKEN(RAND()*4+4), "@example.com")
FOR j IN SPLIT("CREATE,DEPLOY,REMOVE,START,STOP", ",")
FOR k IN 1..RAND()*150+50
INSERT {metadata: {emailAddress: email}, type: j} INTO events RETURN NEW
我有特定类型的事件列表,结构如下:
{
createdAt: 123123132,
type: STARTED,
metadata: {
emailAddress: "foo@bar.com"
}
}
类型的数量是预定义的(START
、STOP
、REMOVE
...)。用户在一段时间内产生一个或多个事件。
我想获得以下聚合:
For each user, calculate the number of events for each type.
我的 AQL 查询如下所示:
FOR event IN events
COLLECT
email = event.metadata.emailAddress,
type = event.type WITH COUNT INTO count
LIMIT 10
RETURN {
email,
t: {type, count}
}
这会产生以下输出:
{ email: '_84@example.com', t: { type: 'CREATE', count: 203 } }
{ email: '_84@example.com', t: { type: 'DEPLOY', count: 214 } }
{ email: '_84@example.com', t: { type: 'REMOVE', count: 172 } }
{ email: '_84@example.com', t: { type: 'START', count: 204 } }
{ email: '_84@example.com', t: { type: 'STOP', count: 187 } }
{ email: '_95@example.com', t: { type: 'CREATE', count: 189 } }
{ email: '_95@example.com', t: { type: 'DEPLOY', count: 173 } }
{ email: '_95@example.com', t: { type: 'REMOVE', count: 194 } }
{ email: '_95@example.com', t: { type: 'START', count: 213 } }
{ email: '_95@example.com', t: { type: 'STOP', count: 208 } }
...
即每种类型我都有一行。但我想要这样的结果:
{ email: foo@bar.com, count1: 203, count2: 214, count3: 172 ...}
{ email: aaa@fff.com, count1: 189, count2: 173, count3: 194 ...}
...
或
{ email: foo@bar.com, CREATE: 203, DEPLOY: 214, ... }
...
即再次对结果进行分组。
我还需要按计数对结果(而不是事件)进行排序:到 return 例如事件数量最多的前 10 位用户。
怎么做?
一个解决方案
这里有一个解决方案,请查看已接受的答案以了解更多信息。
FOR a in (FOR event IN events
COLLECT
emailAddress = event.metadata.emailAddress,
type = event.type WITH COUNT INTO count
COLLECT email = emailAddress INTO perUser KEEP type, count
RETURN MERGE(PUSH(perUser[* RETURN {[LOWER(CURRENT.type)]: CURRENT.count}], {email})))
SORT a.create desc
LIMIT 10
RETURN a
您可以按用户和事件类型分组,然后再次按用户分组,只保留类型和已计算的事件类型计数。在第二次聚合中,重要的是要知道事件属于哪些组以构建结果。 array inline projection 可用于保持查询简短:
FOR event IN events
COLLECT
emailAddress = event.metadata.emailAddress,
type = event.type WITH COUNT INTO count
COLLECT email = emailAddress INTO perUser KEEP type, count
RETURN MERGE(PUSH(perUser[* RETURN {[CURRENT.type]: CURRENT.count}], {email}))
另一种方法是按用户分组并保留事件类型,然后在子查询中对类型进行分组。但它在我的测试中明显较慢(至少没有定义任何索引):
FOR event IN events
LET type = event.type
COLLECT
email = event.metadata.emailAddress INTO groups KEEP type
LET byType = (
FOR t IN groups[*].type
COLLECT t2 = t WITH COUNT INTO count
RETURN {[t2]: count}
)
RETURN MERGE(PUSH(byType, {email}))
Return获取 CREATE 事件最多的前 10 名用户要简单得多。过滤 CREATE 事件类型,然后按用户分组并计算事件数,按此数字降序排序 return 前 10 个结果:
FOR event IN events
FILTER event.type == "CREATE"
COLLECT email = event.metadata.emailAddress WITH COUNT INTO count
SORT count DESC
LIMIT 10
RETURN {email, count}
EDIT1:Return 每个用户一个文档,事件类型已分组和计数(如在第一个查询中),但捕获 MERGE 结果,按计数排序一种特定事件类型(此处:CREATE)和 return 这种类型的前 10 名用户。结果与问题中给出的解决方案相同。但是,它保留了子查询 FOR a IN (FOR event IN events ...) ... RETURN a
:
FOR event IN events
COLLECT
emailAddress = event.metadata.emailAddress,
type = event.type WITH COUNT INTO count
COLLECT email = emailAddress INTO perUser KEEP type, count
LET ret = MERGE(PUSH(perUser[* RETURN {[CURRENT.type]: CURRENT.count}], {email}))
SORT ret.CREATE DESC
LIMIT 10
RETURN ret
EDIT2:查询生成示例数据(需要集合events
存在):
FOR i IN 1..100
LET email = CONCAT(RANDOM_TOKEN(RAND()*4+4), "@example.com")
FOR j IN SPLIT("CREATE,DEPLOY,REMOVE,START,STOP", ",")
FOR k IN 1..RAND()*150+50
INSERT {metadata: {emailAddress: email}, type: j} INTO events RETURN NEW