计算嵌入字段的平均值 documents/array

Calculate the average of fields in embedded documents/array

我想用数组评级中的评级字段计算此对象的 rating_average 字段。你能帮助我了解如何使用 $avg 进行聚合吗?

{
    "title": "The Hobbit",
    "rating_average": "???",
    "ratings": [
        {
            "title": "best book ever",
            "rating": 5
        },
        {
            "title": "good book",
            "rating": 3.5
        }
    ]
}

aggregation framework in MongoDB 3.4 and newer offers the $reduce 运算符,无需额外的管道即可有效地计算总数。考虑将其用作 return 的表达式 总评分并使用 $size. Together with $addFields, the average can thus be calculated using the arithmetic operator $divide 获得评分数,如公式 average = total ratings/number of ratings:

db.collection.aggregate([
    { 
        "$addFields": { 
            "rating_average": {
                "$divide": [
                    { // expression returns total
                        "$reduce": {
                            "input": "$ratings",
                            "initialValue": 0,
                            "in": { "$add": ["$$value", "$$this.rating"] }
                        }
                    },
                    { // expression returns ratings count
                        "$cond": [
                            { "$ne": [ { "$size": "$ratings" }, 0 ] },
                            { "$size": "$ratings" }, 
                            1
                        ]
                    }
                ]
            }
        }
    }           
])

示例输出

{
    "_id" : ObjectId("58ab48556da32ab5198623f4"),
    "title" : "The Hobbit",
    "ratings" : [ 
        {
            "title" : "best book ever",
            "rating" : 5.0
        }, 
        {
            "title" : "good book",
            "rating" : 3.5
        }
    ],
    "rating_average" : 4.25
}

对于旧版本,您需要首先在 ratings 数组字段上应用 $unwind 运算符作为初始聚合管道步骤。这将从输入文档中解构 ratings 数组字段,为每个元素输出一个文档。每个输出文档用一个元素值替换数组。

第二个管道阶段将是 $group operator which groups input documents by the _id and title keys identifier expression and applies the desired $avg accumulator expression to each group that calculates the average. There is another accumulator operator $push,它通过 return 对上述组中的每个文档应用表达式得到的所有值的数组来保留原始评级数组字段。

流水线的最后一步是 $project 运算符,它随后重塑流中的每个文档,例如通过添加新字段 ratings_average.

因此,例如,如果您的 collection 中有一个示例文档(从上到下):

db.collection.insert({
    "title": "The Hobbit",

    "ratings": [
        {
            "title": "best book ever",
            "rating": 5
        },
        {
            "title": "good book",
            "rating": 3.5
        }
    ]
})

要计算评分数组平均值并将值投影到另一个字段 ratings_average,您可以应用以下聚合管道:

db.collection.aggregate([
    {
        "$unwind": "$ratings"
    },
    {
        "$group": {
            "_id": {
                "_id": "$_id",
                "title": "$title"
            },
            "ratings":{
                "$push": "$ratings"
            },
            "ratings_average": {
                "$avg": "$ratings.rating"
            }
        }
    },
    {
        "$project": {
            "_id": 0,
            "title": "$_id.title",
            "ratings_average": 1,
            "ratings": 1
        }
    }
])

结果:

/* 1 */
{
    "result" : [ 
        {
            "ratings" : [ 
                {
                    "title" : "best book ever",
                    "rating" : 5
                }, 
                {
                    "title" : "good book",
                    "rating" : 3.5
                }
            ],
            "ratings_average" : 4.25,
            "title" : "The Hobbit"
        }
    ],
    "ok" : 1
}

由于您将要计算的平均数据放在数组中,因此首先需要展开它。通过在聚合管道中使用 $unwind 来完成:

{$unwind: "$ratings"}

然后您可以在聚合的结果文档中访问数组的每个元素作为嵌入文档,键为 ratings。然后你只需要 $group by title 并计算 $avg:

{$group: {_id: "$title", ratings: {$push: "$ratings"}, average: {$avg: "$ratings.rating"}}}

然后只需恢复您的 title 字段:

{$project: {_id: 0, title: "$_id", ratings: 1, average: 1}}

这是您的结果聚合管道:

db.yourCollection.aggregate([
                               {$unwind: "$ratings"}, 
                               {$group: {_id: "$title", 
                                         ratings: {$push: "$ratings"}, 
                                         average: {$avg: "$ratings.rating"}
                                        }
                               },
                               {$project: {_id: 0, title: "$_id", ratings: 1, average: 1}}
                            ])

这真的可以写得更短,甚至在撰写本文时也是如此。如果你想要 "average" 只需使用 $avg:

db.collection.aggregate([
  { "$addFields": {
    "rating_average": { "$avg": "$ratings.rating" }
  }}
])

原因是从 MongoDB 3.2 开始,$avg 运算符获得了 "two" 东西:

  1. 能够以 "expression" 形式处理 "array" 个参数,而不仅仅是作为 $group

    [=47 的累加器=]
  2. 受益于 MongoDB 3.2 的特性,允许数组表达式的 "shorthand" 符号。组成:

    { "array": [ "$fielda", "$fieldb" ] }
    

    或将数组中的单个 属性 标记为 属性 的值的数组:

    { "$avg": "$ratings.rating" } // equal to { "$avg": [ 5, 3.5 ] }
    

在早期版本中,您必须使用 $map 才能访问每个数组元素内的 "rating" 属性。现在你不需要了。


郑重声明,即使 $reduce 用法也可以简化:

db.collection.aggregate([
  { "$addFields": {
    "rating_average": {
      "$reduce": {
        "input": "$ratings",
        "initialValue": 0,
        "in": {
          "$add": [ 
            "$$value",
            { "$divide": [ 
              "$$this.rating", 
              { "$size": { "$ifNull": [ "$ratings", [] ] } }
            ]}
          ]
        }
      }
    }
  }}
])

如前所述,这实际上只是重新实现了现有的 $avg 功能,因此既然该运算符可用,那么就应该使用它。