在 mongodb 聚合中将多个对象合并为一个对象
Merging multiple objects into a single object in the mongodb aggregation
这就是我所处的场景。我有以下一种数组,在聚合阶段 returns。为简单起见,删除了所有不必要的道具。该数组已经按日期排序,这意味着 clocked_in_at 属性 将按照每个对象创建的时间排序。
[
{
"_id": "618192d4654484639c47fa2d",
"clocked_out_at": "2021-11-05T10:00:00.000Z",
"clocked_in_at": "2021-11-05T03:00:00.000Z",
"visitor_id": "6166c10965959d147c69aa90" // this is here as a string
},
{
"_id": "6182552fde30e84900ba33fd",
"clocked_out_at": "2021-11-05T11:00:00.000Z",
"clocked_in_at": "2021-11-05T04:00:00.000Z",
"visitor_id": "6182e4cea8b52121d01dff1b"
},
{
"_id": "6182552fde30e84900ba33fd",
"clocked_out_at": "2021-11-05T12:00:00.000Z",
"clocked_in_at": "2021-11-05T05:00:00.000Z",
"visitor_id": "6166c10965959d147c69aa90"
},
{
"_id": "6182552fde30e84900ba33fd",
"clocked_out_at": "2021-11-06T13:00:00.000Z",
"clocked_in_at": "2021-11-06T06:00:00.000Z",
"visitor_id": "6166c10965959d147c69aa90"
}
]
因此您可以看到第一个、第三个和最后一个对象来自同一位访问者,而第二个对象来自另一位访问者。所以我基本上需要的是根据 visitor_id 和 date/time 合并数组中的所有对象,并根据数组中最后一个现有值设置 clocked_out_at 值一天,如果这是有道理的。基本上,我们需要根据 clocked_in_at 值将它们单独分组。如果同一访问者处于不同的 clocked_in_at 日期,则它们仍应位于两个对象中。
所以预期的输出是这样的:
[
{
"_id": "618192d4654484639c47fa2d",
"clocked_out_at": "2021-11-05T12:00:00.000Z",
"clocked_in_at": "2021-11-05T03:00:00.000Z",
"visitor_id": "6166c10965959d147c69aa90"
},
{
"_id": "6182552fde30e84900ba33fd",
"clocked_out_at": "2021-11-05T11:00:00.000Z",
"clocked_in_at": "2021-11-05T04:00:00.000Z",
"visitor_id": "6182e4cea8b52121d01dff1b"
},
{
"_id": "6182552fde30e84900ba33fd",
"clocked_out_at": "2021-11-06T13:00:00.000Z",
"clocked_in_at": "2021-11-06T06:00:00.000Z",
"visitor_id": "6166c10965959d147c69aa90"
},
]
所以,在这里您可以看到原始数组中的第一个和第三个对象已合并。因为 clocked_in_at 和 visitor_id 的日期相同(忽略时间)。尽管最后一个对象来自同一个 visitor_id,但它没有合并,因为它是在 11 月 6 日,而第二个对象很明显,它没有合并,因为它有一个完全不同的 visitor_id.
请注意,原始数组中第 3 个对象的 clocked_out_at 值已合并到结果合并对象中,即结果数组中第一个对象的 clocked_out_at。
我不太确定这样做的可能性,但我很想知道我们是否有任何解决方案。我希望像 $mergeObjects or $group 这样的东西。我试过他们但没有成功。
感谢您的宝贵时间,谢谢!
使用$group
db.collection.aggregate([
{
"$group": {
"_id": {
"clocked_in_at": {
$dateTrunc: {
date: {"$toDate": "$clocked_in_at" },
unit: "day"
}
},
"visitor_id": "$visitor_id"
},
"max": { "$max": "$clocked_out_at" },
"min": { "$min": "$clocked_in_at"},
"id": { "$first": "$id" }
}
},
{
"$project": {
_id: "$id",
"visitor_id": "$_id.visitor_id",
"clocked_out_at": "$max",
"clocked_in_at": "$min"
}
}
])
如果 clocked_in_at
和 clocked_out_at
是字符串,我们可以使用 $toDate 首先将它们转换为日期(这将使排序更容易)。如果它们已经是日期,我们可以跳过这一步。
然后我们可以$project each clocked in and clocked out times into an array of objects which contain the day and the value. $dateTrunc is used to convert clocked_in_at
and clocked_out_at
to days, then $unwind the newly created datetime
field. Now we can $group by the day ("$datetime.day") and
visitor_id` keeping the $min in time and $max out time per day. We can $project再次清理对象结构:
db.collection.aggregate([
// (Assuming strings not dates) Convert to DateTime
{
"$addFields": {
"clocked_in_at": {
"$toDate": "$clocked_in_at"
},
"clocked_out_at": {
"$toDate": "$clocked_out_at"
}
}
},
{
"$project": {
"s_id": "$s_id",
"visitor_id": "$visitor_id",
"datetime": [
{
"day": {
"$dateTrunc": {
"date": "$clocked_in_at",
"unit": "day"
}
},
"in": "$clocked_in_at"
},
{
"day": {
"$dateTrunc": {
"date": "$clocked_out_at",
"unit": "day"
}
},
"out": "$clocked_out_at"
}
]
}
},
{
"$unwind": "$datetime"
},
{
"$group": {
"_id": {
"visitor_id": "$visitor_id",
"day": "$datetime.day"
},
"s_id": {
"$first": "$s_id"
},
"clocked_in_at": {
"$min": "$datetime.in"
},
"clocked_out_at": {
"$max": "$datetime.out"
}
}
},
{
"$project": {
"_id": "$s_id",
"clocked_out_at": "$clocked_out_at",
"clocked_in_at": "$clocked_in_at",
"visitor_id": "$_id.visitor_id"
}
}
])
[
{
"_id": "6182552fde30e84900ba33fd",
"clocked_in_at": ISODate("2021-11-06T06:00:00Z"),
"clocked_out_at": ISODate("2021-11-06T13:00:00Z"),
"visitor_id": "6166c10965959d147c69aa90"
},
{
"_id": "618192d4654484639c47fa2d",
"clocked_in_at": ISODate("2021-11-05T03:00:00Z"),
"clocked_out_at": ISODate("2021-11-05T12:00:00Z"),
"visitor_id": "6166c10965959d147c69aa90"
},
{
"_id": "6182552fde30e84900ba33fd",
"clocked_in_at": ISODate("2021-11-05T04:00:00Z"),
"clocked_out_at": ISODate("2021-11-05T11:00:00Z"),
"visitor_id": "6182e4cea8b52121d01dff1b"
}
]
注意 _id
在提供的示例中不是唯一的,因此该字段被修改为 s_id
:
[
{
"s_id": "618192d4654484639c47fa2d",
"clocked_out_at": "2021-11-05T10:00:00.000Z",
"clocked_in_at": "2021-11-05T03:00:00.000Z",
"visitor_id": "6166c10965959d147c69aa90"
},
{
"s_id": "6182552fde30e84900ba33fd",
"clocked_out_at": "2021-11-05T11:00:00.000Z",
"clocked_in_at": "2021-11-05T04:00:00.000Z",
"visitor_id": "6182e4cea8b52121d01dff1b"
},
{
"s_id": "6182552fde30e84900ba33fd",
"clocked_out_at": "2021-11-05T12:00:00.000Z",
"clocked_in_at": "2021-11-05T05:00:00.000Z",
"visitor_id": "6166c10965959d147c69aa90"
},
{
"s_id": "6182552fde30e84900ba33fd",
"clocked_out_at": "2021-11-06T13:00:00.000Z",
"clocked_in_at": "2021-11-06T06:00:00.000Z",
"visitor_id": "6166c10965959d147c69aa90"
}
]
这意味着初始 $project 需要更新为:
{
"$project": {
"s_id": "$_id", // <- grab `_id` instead of `s_id`
这就是我所处的场景。我有以下一种数组,在聚合阶段 returns。为简单起见,删除了所有不必要的道具。该数组已经按日期排序,这意味着 clocked_in_at 属性 将按照每个对象创建的时间排序。
[
{
"_id": "618192d4654484639c47fa2d",
"clocked_out_at": "2021-11-05T10:00:00.000Z",
"clocked_in_at": "2021-11-05T03:00:00.000Z",
"visitor_id": "6166c10965959d147c69aa90" // this is here as a string
},
{
"_id": "6182552fde30e84900ba33fd",
"clocked_out_at": "2021-11-05T11:00:00.000Z",
"clocked_in_at": "2021-11-05T04:00:00.000Z",
"visitor_id": "6182e4cea8b52121d01dff1b"
},
{
"_id": "6182552fde30e84900ba33fd",
"clocked_out_at": "2021-11-05T12:00:00.000Z",
"clocked_in_at": "2021-11-05T05:00:00.000Z",
"visitor_id": "6166c10965959d147c69aa90"
},
{
"_id": "6182552fde30e84900ba33fd",
"clocked_out_at": "2021-11-06T13:00:00.000Z",
"clocked_in_at": "2021-11-06T06:00:00.000Z",
"visitor_id": "6166c10965959d147c69aa90"
}
]
因此您可以看到第一个、第三个和最后一个对象来自同一位访问者,而第二个对象来自另一位访问者。所以我基本上需要的是根据 visitor_id 和 date/time 合并数组中的所有对象,并根据数组中最后一个现有值设置 clocked_out_at 值一天,如果这是有道理的。基本上,我们需要根据 clocked_in_at 值将它们单独分组。如果同一访问者处于不同的 clocked_in_at 日期,则它们仍应位于两个对象中。
所以预期的输出是这样的:
[
{
"_id": "618192d4654484639c47fa2d",
"clocked_out_at": "2021-11-05T12:00:00.000Z",
"clocked_in_at": "2021-11-05T03:00:00.000Z",
"visitor_id": "6166c10965959d147c69aa90"
},
{
"_id": "6182552fde30e84900ba33fd",
"clocked_out_at": "2021-11-05T11:00:00.000Z",
"clocked_in_at": "2021-11-05T04:00:00.000Z",
"visitor_id": "6182e4cea8b52121d01dff1b"
},
{
"_id": "6182552fde30e84900ba33fd",
"clocked_out_at": "2021-11-06T13:00:00.000Z",
"clocked_in_at": "2021-11-06T06:00:00.000Z",
"visitor_id": "6166c10965959d147c69aa90"
},
]
所以,在这里您可以看到原始数组中的第一个和第三个对象已合并。因为 clocked_in_at 和 visitor_id 的日期相同(忽略时间)。尽管最后一个对象来自同一个 visitor_id,但它没有合并,因为它是在 11 月 6 日,而第二个对象很明显,它没有合并,因为它有一个完全不同的 visitor_id.
请注意,原始数组中第 3 个对象的 clocked_out_at 值已合并到结果合并对象中,即结果数组中第一个对象的 clocked_out_at。
我不太确定这样做的可能性,但我很想知道我们是否有任何解决方案。我希望像 $mergeObjects or $group 这样的东西。我试过他们但没有成功。
感谢您的宝贵时间,谢谢!
使用$group
db.collection.aggregate([
{
"$group": {
"_id": {
"clocked_in_at": {
$dateTrunc: {
date: {"$toDate": "$clocked_in_at" },
unit: "day"
}
},
"visitor_id": "$visitor_id"
},
"max": { "$max": "$clocked_out_at" },
"min": { "$min": "$clocked_in_at"},
"id": { "$first": "$id" }
}
},
{
"$project": {
_id: "$id",
"visitor_id": "$_id.visitor_id",
"clocked_out_at": "$max",
"clocked_in_at": "$min"
}
}
])
如果 clocked_in_at
和 clocked_out_at
是字符串,我们可以使用 $toDate 首先将它们转换为日期(这将使排序更容易)。如果它们已经是日期,我们可以跳过这一步。
然后我们可以$project each clocked in and clocked out times into an array of objects which contain the day and the value. $dateTrunc is used to convert clocked_in_at
and clocked_out_at
to days, then $unwind the newly created datetime
field. Now we can $group by the day ("$datetime.day") and
visitor_id` keeping the $min in time and $max out time per day. We can $project再次清理对象结构:
db.collection.aggregate([
// (Assuming strings not dates) Convert to DateTime
{
"$addFields": {
"clocked_in_at": {
"$toDate": "$clocked_in_at"
},
"clocked_out_at": {
"$toDate": "$clocked_out_at"
}
}
},
{
"$project": {
"s_id": "$s_id",
"visitor_id": "$visitor_id",
"datetime": [
{
"day": {
"$dateTrunc": {
"date": "$clocked_in_at",
"unit": "day"
}
},
"in": "$clocked_in_at"
},
{
"day": {
"$dateTrunc": {
"date": "$clocked_out_at",
"unit": "day"
}
},
"out": "$clocked_out_at"
}
]
}
},
{
"$unwind": "$datetime"
},
{
"$group": {
"_id": {
"visitor_id": "$visitor_id",
"day": "$datetime.day"
},
"s_id": {
"$first": "$s_id"
},
"clocked_in_at": {
"$min": "$datetime.in"
},
"clocked_out_at": {
"$max": "$datetime.out"
}
}
},
{
"$project": {
"_id": "$s_id",
"clocked_out_at": "$clocked_out_at",
"clocked_in_at": "$clocked_in_at",
"visitor_id": "$_id.visitor_id"
}
}
])
[
{
"_id": "6182552fde30e84900ba33fd",
"clocked_in_at": ISODate("2021-11-06T06:00:00Z"),
"clocked_out_at": ISODate("2021-11-06T13:00:00Z"),
"visitor_id": "6166c10965959d147c69aa90"
},
{
"_id": "618192d4654484639c47fa2d",
"clocked_in_at": ISODate("2021-11-05T03:00:00Z"),
"clocked_out_at": ISODate("2021-11-05T12:00:00Z"),
"visitor_id": "6166c10965959d147c69aa90"
},
{
"_id": "6182552fde30e84900ba33fd",
"clocked_in_at": ISODate("2021-11-05T04:00:00Z"),
"clocked_out_at": ISODate("2021-11-05T11:00:00Z"),
"visitor_id": "6182e4cea8b52121d01dff1b"
}
]
注意 _id
在提供的示例中不是唯一的,因此该字段被修改为 s_id
:
[
{
"s_id": "618192d4654484639c47fa2d",
"clocked_out_at": "2021-11-05T10:00:00.000Z",
"clocked_in_at": "2021-11-05T03:00:00.000Z",
"visitor_id": "6166c10965959d147c69aa90"
},
{
"s_id": "6182552fde30e84900ba33fd",
"clocked_out_at": "2021-11-05T11:00:00.000Z",
"clocked_in_at": "2021-11-05T04:00:00.000Z",
"visitor_id": "6182e4cea8b52121d01dff1b"
},
{
"s_id": "6182552fde30e84900ba33fd",
"clocked_out_at": "2021-11-05T12:00:00.000Z",
"clocked_in_at": "2021-11-05T05:00:00.000Z",
"visitor_id": "6166c10965959d147c69aa90"
},
{
"s_id": "6182552fde30e84900ba33fd",
"clocked_out_at": "2021-11-06T13:00:00.000Z",
"clocked_in_at": "2021-11-06T06:00:00.000Z",
"visitor_id": "6166c10965959d147c69aa90"
}
]
这意味着初始 $project 需要更新为:
{
"$project": {
"s_id": "$_id", // <- grab `_id` instead of `s_id`