如何在 mongodb 中按类别汇总标签
How do I summarize tags by category in mongodb
我有一个 collection 形状如下:
[
{
_id: ObjectId("5d8e8c9b8f8b9b7b7a8b4567"),
tags: {
language: [ 'en' ],
industries: [ 'agency', 'travel' ],
countries: [ 'ca', 'us' ],
regions: [ 'north-america' ],
}
},
{
_id: ObjectId("5d8e8c9b8f8b9b7b7a8b4568"),
tags: {
language: [ 'en', 'fr' ],
industries: [ 'travel' ],
countries: [ 'ca' ]
}
},
{
_id: ObjectId("5d8e8c9b8f8b9b7b7a8b4569"),
tags: {
language: [ 'en' ],
industries: [ 'agency', 'travel' ],
countries: [ 'ca', 'us' ],
regions: [ 'south-america' ]
}
},
]
我想生成这个结果...
{
//* count of all documents
"count": 3,
//* count of all documents that contain any slug within the given category
"countWithCategorySlug": {
"language": 3,
"industries": 3,
"countries": 3,
"regions": 2
},
//* per category: count of documents that contain that slug in the givin category
"language" {
"en": 3,
"fr": 1
},
"industries" {
"agency": 2,
"travel": 3,
},
"countries" {
"ca": 3,
"us": 2
},
"regions" {
"north-america": 1,
"south-america": 1
}
}
超级卡住,所以任何帮助将不胜感激。 :)
类别数量未知,我有一个代码解决方案查询不同类别和 slug 的列表,然后为每个生成一个 $group 阶段...结果查询太大,需要一个更好的方法...问题是我完全不知道如何优化它...
查询
- 完成构面之前的第一部分将它们分开并为每个值制作 1 个文档,如
[{
"type": "language",
"value": "en",
"_id": ObjectId("5d8e8c9b8f8b9b7b7a8b4567")
},
{
"type": "industries",
"value": "agency",
"_id": ObjectId("5d8e8c9b8f8b9b7b7a8b4567")
},
{
"type": "industries",
"value": "travel",
"_id": ObjectId("5d8e8c9b8f8b9b7b7a8b4567")
},
{
"type": "countries",
"value": "ca",
"_id": ObjectId("5d8e8c9b8f8b9b7b7a8b4567")
}]
- 然后用 3 个字段分面并计算文档
- 并在转换之后获得与预期输出类似的键数据
ggregate(
[{"$set": {"tags": {"$objectToArray": "$tags"}}},
{"$set":
{"tags":
{"$map":
{"input": "$tags",
"in": {"type": "$$this.k", "value": "$$this.v", "_id": "$_id"}}}}},
{"$unwind": "$tags"},
{"$replaceRoot": {"newRoot": "$tags"}},
{"$unwind": "$value"},
{"$facet":
{"count":
[{"$group": {"_id": null, "count": {"$addToSet": "$_id"}}},
{"$set": {"count": {"$size": "$count"}}}],
"category":
[{"$group": {"_id": "$type", "count": {"$addToSet": "$_id"}}},
{"$set": {"count": {"$size": "$count"}}}],
"values":
[{"$group":
{"_id": "$value",
"type": {"$first": "$type"},
"values": {"$addToSet": "$_id"}}},
{"$set": {"values": {"$size": "$values"}}},
{"$group":
{"_id": "$type",
"values":
{"$push":
{"type": "$type", "value": "$_id", "count": "$values"}}}}]}},
{"$set":
{"count":
{"$getField":
{"field": "count", "input": {"$arrayElemAt": ["$count", 0]}}},
"category":
{"$arrayToObject":
[{"$map":
{"input": "$category",
"in": {"k": "$$this._id", "v": "$$this.count"}}}]},
"values":
{"$arrayToObject":
[{"$map":
{"input": "$values",
"in":
{"k": "$$this._id",
"v":
{"$arrayToObject":
[{"$map":
{"input": "$$this.values",
"in": {"k": "$$this.value", "v": "$$this.count"}}}]}}}}]}}}])
产出
[{
"count": 3,
"category": {
"countries": 3,
"industries": 3,
"regions": 2,
"language": 3
},
"values": {
"regions": {
"south-america": 1,
"north-america": 1
},
"countries": {
"us": 2,
"ca": 3
},
"language": {
"fr": 1,
"en": 3
},
"industries": {
"agency": 2,
"travel": 3
}
}
}]
我有一个 collection 形状如下:
[
{
_id: ObjectId("5d8e8c9b8f8b9b7b7a8b4567"),
tags: {
language: [ 'en' ],
industries: [ 'agency', 'travel' ],
countries: [ 'ca', 'us' ],
regions: [ 'north-america' ],
}
},
{
_id: ObjectId("5d8e8c9b8f8b9b7b7a8b4568"),
tags: {
language: [ 'en', 'fr' ],
industries: [ 'travel' ],
countries: [ 'ca' ]
}
},
{
_id: ObjectId("5d8e8c9b8f8b9b7b7a8b4569"),
tags: {
language: [ 'en' ],
industries: [ 'agency', 'travel' ],
countries: [ 'ca', 'us' ],
regions: [ 'south-america' ]
}
},
]
我想生成这个结果...
{
//* count of all documents
"count": 3,
//* count of all documents that contain any slug within the given category
"countWithCategorySlug": {
"language": 3,
"industries": 3,
"countries": 3,
"regions": 2
},
//* per category: count of documents that contain that slug in the givin category
"language" {
"en": 3,
"fr": 1
},
"industries" {
"agency": 2,
"travel": 3,
},
"countries" {
"ca": 3,
"us": 2
},
"regions" {
"north-america": 1,
"south-america": 1
}
}
超级卡住,所以任何帮助将不胜感激。 :)
类别数量未知,我有一个代码解决方案查询不同类别和 slug 的列表,然后为每个生成一个 $group 阶段...结果查询太大,需要一个更好的方法...问题是我完全不知道如何优化它...
查询
- 完成构面之前的第一部分将它们分开并为每个值制作 1 个文档,如
[{
"type": "language",
"value": "en",
"_id": ObjectId("5d8e8c9b8f8b9b7b7a8b4567")
},
{
"type": "industries",
"value": "agency",
"_id": ObjectId("5d8e8c9b8f8b9b7b7a8b4567")
},
{
"type": "industries",
"value": "travel",
"_id": ObjectId("5d8e8c9b8f8b9b7b7a8b4567")
},
{
"type": "countries",
"value": "ca",
"_id": ObjectId("5d8e8c9b8f8b9b7b7a8b4567")
}]
- 然后用 3 个字段分面并计算文档
- 并在转换之后获得与预期输出类似的键数据
ggregate(
[{"$set": {"tags": {"$objectToArray": "$tags"}}},
{"$set":
{"tags":
{"$map":
{"input": "$tags",
"in": {"type": "$$this.k", "value": "$$this.v", "_id": "$_id"}}}}},
{"$unwind": "$tags"},
{"$replaceRoot": {"newRoot": "$tags"}},
{"$unwind": "$value"},
{"$facet":
{"count":
[{"$group": {"_id": null, "count": {"$addToSet": "$_id"}}},
{"$set": {"count": {"$size": "$count"}}}],
"category":
[{"$group": {"_id": "$type", "count": {"$addToSet": "$_id"}}},
{"$set": {"count": {"$size": "$count"}}}],
"values":
[{"$group":
{"_id": "$value",
"type": {"$first": "$type"},
"values": {"$addToSet": "$_id"}}},
{"$set": {"values": {"$size": "$values"}}},
{"$group":
{"_id": "$type",
"values":
{"$push":
{"type": "$type", "value": "$_id", "count": "$values"}}}}]}},
{"$set":
{"count":
{"$getField":
{"field": "count", "input": {"$arrayElemAt": ["$count", 0]}}},
"category":
{"$arrayToObject":
[{"$map":
{"input": "$category",
"in": {"k": "$$this._id", "v": "$$this.count"}}}]},
"values":
{"$arrayToObject":
[{"$map":
{"input": "$values",
"in":
{"k": "$$this._id",
"v":
{"$arrayToObject":
[{"$map":
{"input": "$$this.values",
"in": {"k": "$$this.value", "v": "$$this.count"}}}]}}}}]}}}])
产出
[{
"count": 3,
"category": {
"countries": 3,
"industries": 3,
"regions": 2,
"language": 3
},
"values": {
"regions": {
"south-america": 1,
"north-america": 1
},
"countries": {
"us": 2,
"ca": 3
},
"language": {
"fr": 1,
"en": 3
},
"industries": {
"agency": 2,
"travel": 3
}
}
}]