基于另一个字段中的重复项计算字段中重复项的聚合函数

Aggregation function for Counting of Duplicates in a field based on duplicate items in another field

我正在使用 mongoengine 作为带有 Flask 应用程序的 ORM。模型 class 定义为

class MyData(db.Document):
    task_id = db.StringField(max_length=50, required=True)
    url = db.URLField(max_length=500,required=True,unique=True)
    organization = db.StringField(max_length=250,required=True)
    val = db.StringField(max_length=50, required=True)

字段组织可能会重复,我想获取与另一个字段中的值相关的重复项计数。例如,如果 mongodb 中的数据类似于

[{"task_id":"as4d2rds5","url":"https:example1.com","organization":"Avengers","val":"null"},
 {"task_id":"rfre43fed","url":"https:example1.com","organization":"Avengers","val":"valid"},
 {"task_id":"uyje3dsxs","url":"https:example2.com","organization":"Metro","val":"valid"},
 {"task_id":"ghs563vt6","url":"https:example1.com","organization":"Avengers","val":"invalid"},
 {"task_id":"erf6egy64","url":"https:example2.com","organization":"Metro","val":"null"}]

然后我使用

查询所有对象
data = MyData.objects()

我想要这样的回复

[{"url":"https:example1.com","Avengers":{"valid":1,"null":1,"invalid":1}},{"url":"https:example2.com",Metro":{"valid":1,"null":1,"invalid":0}}]

我试过了

db.collection.aggregate([
  {
    "$group": {
      "_id": "$organization",
      "count": [
        {
          "null": {
            "$sum": 1
          },
          "valid": {
            "$sum": 1
          },
          "invalid": {
            "$sum": 1
          }
        }
      ]
    }
  }
])

但是我遇到了一个错误

The field 'count' must be an accumulator object

也许是这样的:

db.collection.aggregate([
{
  "$group": {
  "_id": {
    k: "$organization",
    v: "$val"
   },
  "cnt": {
    $sum: 1
   }
  }
 },
 {
  $project: {
    _id: 0,
    k: "$_id.k",
    o: {
      k: "$_id.v",
      v: "$cnt"
    }
   }
  },
 {
   $group: {
    _id: "$k",
    v: {
      $push: "$o"
    }
  }
},
{
  $addFields: {
    v: {
      "$arrayToObject": "$v"
    }
  }
},
{
  $project: {
    _id: 0,
    new: [
      {
        k: "$_id",
        v: "$v"
      }
    ]
  }
},
{
  "$addFields": {
    "new": {
      "$arrayToObject": "$new"
    }
  }
},
{
 "$replaceRoot": {
   "newRoot": "$new"
 }
}
])

解释:

  1. 要计数的组
  2. arrayToObject 项目
  3. 要加入价值观的群组
  4. arrayToObject 再一次
  5. 另外项目
  6. arrayToObject组成最终对象
  7. 再项目一次
  8. replaceRoot 将对象移动到根。

P.S。 请注意,如果缺失值不存在,此解决方案不会显示缺失值,如果您需要缺失值,则需要添加额外的映射/mergeObjects

playground1

带有缺失值的选项(如果可能,值固定为空、有效、无效): 只需将第二个 addFiedlds 替换为:

   {
   $addFields: {
     v: {
    "$mergeObjects": [
      {
        "null": 0,
        valid: 0,
        invalid: 0
      },
      {
        "$arrayToObject": "$v"
      }
    ]
   }
  }
 }

playground2

++url:

playground3