带有聚合和分组的 mgo

Question

我正在尝试使用 golang mgo 执行查询为了有效地从联接中获取不同的值，我知道这可能不是 Mongo 中使用的最佳范例。

像这样：

pipe := []bson.M{

    {
        "$group": bson.M{
            "_id":  bson.M{"user": "$user"},

        },
    },

    {
        "$match": bson.M{
            "_id":  bson.M{"$exists": 1},
            "user": bson.M{"$exists": 1},
            "date_updated": bson.M{
                "$gt": durationDays,
            },
        },

    },

    {
        "$lookup": bson.M{
            "from":         "users",
            "localField":   "user",
            "foreignField": "_id",
            "as":           "user_details",
        },
    },
    {
        "$lookup": bson.M{
            "from":         "organizations",
            "localField":   "organization",
            "foreignField": "_id",
            "as":           "organization_details",
        },
    },

}

err := d.Pipe(pipe).All(&result)

如果我注释掉 $group 部分，查询 returns 将按预期进行。

如果我按原样运行，我得到NULL

如果我将 $group 移动到管道的底部，我会得到一个包含 Null 值的数组响应

是否可以使用 $group 进行聚合（目的是模拟 DISTINCT）？

Answer 1

您得到 NULL 的原因是因为您的 $match 过滤器在 $group 阶段之后过滤掉所有文档。

在 $group 的第一阶段之后，文档仅如下例所示：

  {"_id": { "user": "foo"}},
  {"_id": { "user": "bar"}},
  {"_id": { "user": "baz"}}

它们不再包含其他字段，即 user、date_updated 和 organization。如果你想保留他们的价值，你可以利用 Group Accumulator Operator. Depending on your use case you may also benefit from using Aggregation Expression Variables

例如使用 mongo shell, let's use $first operator 基本上选择第一次出现。这可能对 organization 有意义，但对 date_updated 没有意义。请选择更合适的累加器运算符。

{"$group": { 
          "_id":"$user", 
          "date_updated": {"$first":"$date_updated"}, 
          "organization": {"$first":"$organization"}
         }
}

请注意，上面还用更简单的 {"_id":"$user"} 替换了 {"_id":{"user":"$user"}}。

接下来我们将添加 $project stage 以将分组操作中 _id 字段的结果重命名为 user。还可以不加修改地携带其他字段。

{"$project": {
              "user": "$_id", 
              "date_updated": 1, 
              "organization": 1
             }
 }

你的 $match stage can be simplified, by just listing the date_updated filter. First we can remove _id as it's no longer relevant up to this point in the pipeline, and also if you would like to make sure that you only process documents with user value you should placed $match before the $group. See Aggregation Pipeline Optimization 更多。

因此，所有这些组合起来将如下所示：

[
 {"$group":{ 
             "_id": "$user", 
             "date_updated": { "$first": "$date_updated"}, 
             "organization": { $first: "$organization"} 
           }
 },
 {"$project":{ 
               "user": "$_id", 
               "date_updated": 1, 
               "organization": 1
             }
 }, 
 {"$match":{
          "date_updated": {"$gt": durationDays } }
 }, 
 {"$lookup":{
             "from": "users", 
             "localField": "user", 
             "foreignField": "_id", 
             "as": "user_details"
            }
 }, 
 {"$lookup":{
            "from": "organizations", 
            "localField": "organization", 
            "foreignField": "_id", 
            "as": "organization_details"
            }
 }
]

（我知道你知道）最后，基于上面带有 users 和 organizations 集合的数据库架构，根据你的应用程序用例，你可能会重新考虑嵌入一些值。您可能会发现 6 Rules of Thumb for MongoDB Schema Design 很有用。

带有聚合和分组的 mgo

mgo with aggregation and grouping

go

aggregation

mgo