基于另一个字段中的重复项计算字段中重复项的聚合函数
Aggregation function for Counting of Duplicates in a field based on duplicate items in another field
我正在使用 mongoengine 作为带有 Flask 应用程序的 ORM。模型 class 定义为
class MyData(db.Document):
task_id = db.StringField(max_length=50, required=True)
url = db.URLField(max_length=500,required=True,unique=True)
organization = db.StringField(max_length=250,required=True)
val = db.StringField(max_length=50, required=True)
字段组织可能会重复,我想获取与另一个字段中的值相关的重复项计数。例如,如果 mongodb 中的数据类似于
[{"task_id":"as4d2rds5","url":"https:example1.com","organization":"Avengers","val":"null"},
{"task_id":"rfre43fed","url":"https:example1.com","organization":"Avengers","val":"valid"},
{"task_id":"uyje3dsxs","url":"https:example2.com","organization":"Metro","val":"valid"},
{"task_id":"ghs563vt6","url":"https:example1.com","organization":"Avengers","val":"invalid"},
{"task_id":"erf6egy64","url":"https:example2.com","organization":"Metro","val":"null"}]
然后我使用
查询所有对象
data = MyData.objects()
我想要这样的回复
[{"url":"https:example1.com","Avengers":{"valid":1,"null":1,"invalid":1}},{"url":"https:example2.com",Metro":{"valid":1,"null":1,"invalid":0}}]
我试过了
db.collection.aggregate([
{
"$group": {
"_id": "$organization",
"count": [
{
"null": {
"$sum": 1
},
"valid": {
"$sum": 1
},
"invalid": {
"$sum": 1
}
}
]
}
}
])
但是我遇到了一个错误
The field 'count' must be an accumulator object
也许是这样的:
db.collection.aggregate([
{
"$group": {
"_id": {
k: "$organization",
v: "$val"
},
"cnt": {
$sum: 1
}
}
},
{
$project: {
_id: 0,
k: "$_id.k",
o: {
k: "$_id.v",
v: "$cnt"
}
}
},
{
$group: {
_id: "$k",
v: {
$push: "$o"
}
}
},
{
$addFields: {
v: {
"$arrayToObject": "$v"
}
}
},
{
$project: {
_id: 0,
new: [
{
k: "$_id",
v: "$v"
}
]
}
},
{
"$addFields": {
"new": {
"$arrayToObject": "$new"
}
}
},
{
"$replaceRoot": {
"newRoot": "$new"
}
}
])
解释:
- 要计数的组
- arrayToObject 项目
- 要加入价值观的群组
- arrayToObject 再一次
- 另外项目
- arrayToObject组成最终对象
- 再项目一次
- replaceRoot 将对象移动到根。
P.S。
请注意,如果缺失值不存在,此解决方案不会显示缺失值,如果您需要缺失值,则需要添加额外的映射/mergeObjects
带有缺失值的选项(如果可能,值固定为空、有效、无效):
只需将第二个 addFiedlds 替换为:
{
$addFields: {
v: {
"$mergeObjects": [
{
"null": 0,
valid: 0,
invalid: 0
},
{
"$arrayToObject": "$v"
}
]
}
}
}
++url:
我正在使用 mongoengine 作为带有 Flask 应用程序的 ORM。模型 class 定义为
class MyData(db.Document):
task_id = db.StringField(max_length=50, required=True)
url = db.URLField(max_length=500,required=True,unique=True)
organization = db.StringField(max_length=250,required=True)
val = db.StringField(max_length=50, required=True)
字段组织可能会重复,我想获取与另一个字段中的值相关的重复项计数。例如,如果 mongodb 中的数据类似于
[{"task_id":"as4d2rds5","url":"https:example1.com","organization":"Avengers","val":"null"},
{"task_id":"rfre43fed","url":"https:example1.com","organization":"Avengers","val":"valid"},
{"task_id":"uyje3dsxs","url":"https:example2.com","organization":"Metro","val":"valid"},
{"task_id":"ghs563vt6","url":"https:example1.com","organization":"Avengers","val":"invalid"},
{"task_id":"erf6egy64","url":"https:example2.com","organization":"Metro","val":"null"}]
然后我使用
查询所有对象data = MyData.objects()
我想要这样的回复
[{"url":"https:example1.com","Avengers":{"valid":1,"null":1,"invalid":1}},{"url":"https:example2.com",Metro":{"valid":1,"null":1,"invalid":0}}]
我试过了
db.collection.aggregate([
{
"$group": {
"_id": "$organization",
"count": [
{
"null": {
"$sum": 1
},
"valid": {
"$sum": 1
},
"invalid": {
"$sum": 1
}
}
]
}
}
])
但是我遇到了一个错误
The field 'count' must be an accumulator object
也许是这样的:
db.collection.aggregate([
{
"$group": {
"_id": {
k: "$organization",
v: "$val"
},
"cnt": {
$sum: 1
}
}
},
{
$project: {
_id: 0,
k: "$_id.k",
o: {
k: "$_id.v",
v: "$cnt"
}
}
},
{
$group: {
_id: "$k",
v: {
$push: "$o"
}
}
},
{
$addFields: {
v: {
"$arrayToObject": "$v"
}
}
},
{
$project: {
_id: 0,
new: [
{
k: "$_id",
v: "$v"
}
]
}
},
{
"$addFields": {
"new": {
"$arrayToObject": "$new"
}
}
},
{
"$replaceRoot": {
"newRoot": "$new"
}
}
])
解释:
- 要计数的组
- arrayToObject 项目
- 要加入价值观的群组
- arrayToObject 再一次
- 另外项目
- arrayToObject组成最终对象
- 再项目一次
- replaceRoot 将对象移动到根。
P.S。 请注意,如果缺失值不存在,此解决方案不会显示缺失值,如果您需要缺失值,则需要添加额外的映射/mergeObjects
带有缺失值的选项(如果可能,值固定为空、有效、无效): 只需将第二个 addFiedlds 替换为:
{
$addFields: {
v: {
"$mergeObjects": [
{
"null": 0,
valid: 0,
invalid: 0
},
{
"$arrayToObject": "$v"
}
]
}
}
}
++url: