在 mongodb 聚合中将多个对象合并为一个对象

Merging multiple objects into a single object in the mongodb aggregation

这就是我所处的场景。我有以下一种数组,在聚合阶段 returns。为简单起见,删除了所有不必要的道具。该数组已经按日期排序,这意味着 clocked_in_at 属性 将按照每个对象创建的时间排序。

[
        {
            "_id": "618192d4654484639c47fa2d",
            "clocked_out_at": "2021-11-05T10:00:00.000Z",
            "clocked_in_at": "2021-11-05T03:00:00.000Z",
            "visitor_id": "6166c10965959d147c69aa90" // this is here as a string
        },
        {
            "_id": "6182552fde30e84900ba33fd",
            "clocked_out_at": "2021-11-05T11:00:00.000Z",
            "clocked_in_at": "2021-11-05T04:00:00.000Z",
            "visitor_id": "6182e4cea8b52121d01dff1b"
        },
        {
            "_id": "6182552fde30e84900ba33fd",
            "clocked_out_at": "2021-11-05T12:00:00.000Z",
            "clocked_in_at": "2021-11-05T05:00:00.000Z",
            "visitor_id": "6166c10965959d147c69aa90"
        },
        {
            "_id": "6182552fde30e84900ba33fd",
            "clocked_out_at": "2021-11-06T13:00:00.000Z",
            "clocked_in_at": "2021-11-06T06:00:00.000Z",
            "visitor_id": "6166c10965959d147c69aa90"
        }
]

因此您可以看到第一个、第三个和最后一个对象来自同一位访问者,而第二个对象来自另一位访问者。所以我基本上需要的是根据 visitor_id 和 date/time 合并数组中的所有对象,并根据数组中最后一个现有值设置 clocked_out_at 值一天,如果这是有道理的。基本上,我们需要根据 clocked_in_at 值将它们单独分组。如果同一访问者处于不同的 clocked_in_at 日期,则它们仍应位于两个对象中。

所以预期的输出是这样的:

[
        {
            "_id": "618192d4654484639c47fa2d",
            "clocked_out_at": "2021-11-05T12:00:00.000Z",
            "clocked_in_at": "2021-11-05T03:00:00.000Z",
            "visitor_id": "6166c10965959d147c69aa90"
        },
        {
            "_id": "6182552fde30e84900ba33fd",
            "clocked_out_at": "2021-11-05T11:00:00.000Z",
            "clocked_in_at": "2021-11-05T04:00:00.000Z",
            "visitor_id": "6182e4cea8b52121d01dff1b"
        },
        {
            "_id": "6182552fde30e84900ba33fd",
            "clocked_out_at": "2021-11-06T13:00:00.000Z",
            "clocked_in_at": "2021-11-06T06:00:00.000Z",
            "visitor_id": "6166c10965959d147c69aa90"
        },
]

所以,在这里您可以看到原始数组中的第一个和第三个对象已合并。因为 clocked_in_at 和 visitor_id 的日期相同(忽略时间)。尽管最后一个对象来自同一个 visitor_id,但它没有合并,因为它是在 11 月 6 日,而第二个对象很明显,它没有合并,因为它有一个完全不同的 visitor_id.

请注意,原始数组中第 3 个对象的 clocked_out_at 值已合并到结果合并对象中,即结果数组中第一个对象的 clocked_out_at。

我不太确定这样做的可能性,但我很想知道我们是否有任何解决方案。我希望像 $mergeObjects or $group 这样的东西。我试过他们但没有成功。

感谢您的宝贵时间,谢谢!

使用$group

db.collection.aggregate([
  {
    "$group": {
      "_id": {
        "clocked_in_at": {
          $dateTrunc: {
            date: {"$toDate": "$clocked_in_at" },
            unit: "day"
          }
        },
        "visitor_id": "$visitor_id"
      },
      "max": { "$max": "$clocked_out_at" },
      "min": { "$min": "$clocked_in_at"},
      "id": { "$first": "$id" }
    }
  },
  {
    "$project": {
      _id: "$id",
      "visitor_id": "$_id.visitor_id",
      "clocked_out_at": "$max",
      "clocked_in_at": "$min"
    }
  }
])

mongoplayground

如果 clocked_in_atclocked_out_at 是字符串,我们可以使用 $toDate 首先将它们转换为日期(这将使排序更容易)。如果它们已经是日期,我们可以跳过这一步。

然后我们可以$project each clocked in and clocked out times into an array of objects which contain the day and the value. $dateTrunc is used to convert clocked_in_at and clocked_out_at to days, then $unwind the newly created datetime field. Now we can $group by the day ("$datetime.day") and visitor_id` keeping the $min in time and $max out time per day. We can $project再次清理对象结构:

db.collection.aggregate([
  // (Assuming strings not dates) Convert to DateTime
  {
    "$addFields": {
      "clocked_in_at": {
        "$toDate": "$clocked_in_at"
      },
      "clocked_out_at": {
        "$toDate": "$clocked_out_at"
      }
    }
  },
  {
    "$project": {
      "s_id": "$s_id",
      "visitor_id": "$visitor_id",
      "datetime": [
        {
          "day": {
            "$dateTrunc": {
              "date": "$clocked_in_at",
              "unit": "day"
            }
          },
          "in": "$clocked_in_at"
        },
        {
          "day": {
            "$dateTrunc": {
              "date": "$clocked_out_at",
              "unit": "day"
            }
          },
          "out": "$clocked_out_at"
        }
      ]
    }
  },
  {
    "$unwind": "$datetime"
  },
  {
    "$group": {
      "_id": {
        "visitor_id": "$visitor_id",
        "day": "$datetime.day"
      },
      "s_id": {
        "$first": "$s_id"
      },
      "clocked_in_at": {
        "$min": "$datetime.in"
      },
      "clocked_out_at": {
        "$max": "$datetime.out"
      }
    }
  },
  {
    "$project": {
      "_id": "$s_id",
      "clocked_out_at": "$clocked_out_at",
      "clocked_in_at": "$clocked_in_at",
      "visitor_id": "$_id.visitor_id"
    }
  }
])
[
  {
    "_id": "6182552fde30e84900ba33fd",
    "clocked_in_at": ISODate("2021-11-06T06:00:00Z"),
    "clocked_out_at": ISODate("2021-11-06T13:00:00Z"),
    "visitor_id": "6166c10965959d147c69aa90"
  },
  {
    "_id": "618192d4654484639c47fa2d",
    "clocked_in_at": ISODate("2021-11-05T03:00:00Z"),
    "clocked_out_at": ISODate("2021-11-05T12:00:00Z"),
    "visitor_id": "6166c10965959d147c69aa90"
  },
  {
    "_id": "6182552fde30e84900ba33fd",
    "clocked_in_at": ISODate("2021-11-05T04:00:00Z"),
    "clocked_out_at": ISODate("2021-11-05T11:00:00Z"),
    "visitor_id": "6182e4cea8b52121d01dff1b"
  }
]

mongoplayground

注意 _id 在提供的示例中不是唯一的,因此该字段被修改为 s_id:

[
  {
    "s_id": "618192d4654484639c47fa2d",
    "clocked_out_at": "2021-11-05T10:00:00.000Z",
    "clocked_in_at": "2021-11-05T03:00:00.000Z",
    "visitor_id": "6166c10965959d147c69aa90"
  },
  {
    "s_id": "6182552fde30e84900ba33fd",
    "clocked_out_at": "2021-11-05T11:00:00.000Z",
    "clocked_in_at": "2021-11-05T04:00:00.000Z",
    "visitor_id": "6182e4cea8b52121d01dff1b"
  },
  {
    "s_id": "6182552fde30e84900ba33fd",
    "clocked_out_at": "2021-11-05T12:00:00.000Z",
    "clocked_in_at": "2021-11-05T05:00:00.000Z",
    "visitor_id": "6166c10965959d147c69aa90"
  },
  {
    "s_id": "6182552fde30e84900ba33fd",
    "clocked_out_at": "2021-11-06T13:00:00.000Z",
    "clocked_in_at": "2021-11-06T06:00:00.000Z",
    "visitor_id": "6166c10965959d147c69aa90"
  }
]

这意味着初始 $project 需要更新为:

  {
    "$project": {
      "s_id": "$_id", // <- grab `_id` instead of `s_id`