Mongo 聚合：使用 $count 个找到的文档进行查询

Question

我有一个 Mongo 集合 series，其中每个文档都有一个包含 dataPoints 的列表。具有相同 testStepId 的所有 series 包含相同数量的 dataPoints:

{
  "seriesId": {
    "seriesId": "77678ca1-31db-4cec-a042-68a3053b92c6"
  },
  "testStepId": {
    "testStepId": "c152415b-2392-4c2b-af74-51a4973bd257"
  },
  "measurement": {
    "startTime": {
      "$date": "2020-07-07T12:40:49.782Z"
    },
    "endTime": {
      "$date": "2020-07-07T12:42:19.782Z"
    }
  },
  "dataPoints": [
    {
      "timeStamp": {
        "$date": "2020-07-07T12:41:09.782Z"
      },
      "value": "Value_1_1"
    },
    {
      "timeStamp": {
        "$date": "2020-07-07T12:41:29.782Z"
      },
      "value": "Value_1_2"
    },
    {
      "timeStamp": {
        "$date": "2020-07-07T12:41:39.782Z"
      },
      "value": "Value_1_3"
    },
    ...
    {
      "timeStamp": {
        "$date": "2020-07-07T12:42:19.782Z"
      },
      "value": "Value_2_11"
    }
  ]
}

现在我想查询匹配特定 testStepId 的所有 series 文档（没问题）。但是，我不想加载所有找到的 数据点 中的所有 数据点 ，我只想加载 1000 个 数据点 。所以如果找到 10 个 series 我只需要加载 100 dataPoints 每个 series:

-> 加载每个 (dataPoints.size() / 100) 个数据点

-> 这意味着我必须考虑找到的 series 文档的数量和 series[=17] 中 dataPoints 的数量=]

-> 加载第 X 个 dataPoint 其中

X = 1000 / <count of documents> / <count of dataPoints>

我正在努力通过 MongoDB Compass 的聚合来完成这项工作。但是我仍然无法计算找到的文档并取消这个值...

为了简单起见，我只是尝试获取每个第 2 个 dataPoint:

{
    project: {
        dataPoints: {
            $map: {
                input: { $range: [ 0, {"$size": "$dataPoints"}, 2 ] },
                as: "index",
                in: { $arrayElemAt: [ "$dataPoints", "$$index" ] }
            }
        }
    }
}

-> 工作正常

现在我想获取第 x 个 'dataPoint' 依赖项，这取决于找到的文档的数量。为此，我尝试了一些不同的方法，none 其中有效...

尝试：使用 $count 而不是固定数字：

{
    project: {
        dataPoints: {
            $map: {
                input: { $range: [ 0, {"$size": "$dataPoints"}, $count ] },
                as: "index",
                in: { $arrayElemAt: [ "$dataPoints", "$$index" ] }
            }
        }
    }
}

-> “项目说明必须是一个对象”

尝试：将 count 定义为变量：

{
    project: {
        dataPoints: {
            $let: {
                vars: { 
                    total: "$count",
                },
                in: { 
                    $map: {
                        input: { $range: [ 0, {"$size": "$dataPoints"}, "$$total"] },
                        as: "index",
                        in: { $arrayElemAt: [ "$dataPoints", "$$index" ] }
                    }
                }
            }
        }
    }   
}

-> "$range需要一个数值步长，找到的值为type:missing"

显然我的做法是错误的。任何人都可以给我一些提示如何让它工作吗？

Answer 1

我认为 X 的公式是 X = <count of dataPoints> * <count of documents> / 1000

您无法直接访问特定聚合管道阶段的文档数 (count)。但是，您可以将所有文档组合成一个文档并对其进行计数，然后将它们扩展回单独的文档。您可以使用 $group 或 $facet.

来实现此目的

我将用 $group

来展示一个例子

[
  {
    $group: {
      _id: null,
      count: { $sum: 1 },
      all: { $push: "$$ROOT" }
    }
  },
  {
    $unwind: "$all"
  },
  {
    $replaceWith: { // $replaceWith is available from v4.2, for earlier version use { $replaceRoot: { newRoot: <doc> } }
      $mergeObjects: [
        "$all",
        {
          dataPoints: {
            $map: {
              input: {
                $range: [
                  0,
                  { $size: "$all.dataPoints" },
                  {
                    $ceil: {
                      $divide: [
                        {
                          $multiply: [
                            { "$size": "$all.dataPoints" },
                            "$count"
                          ]
                        },
                        1000
                      ]
                    }
                  }
                ]
              },
              as: "index",
              in: { $arrayElemAt: ["$all.dataPoints", "$$index"] }
            }
          }
        }
      ]
    }
  }
]

Mongo Playground

Answer 2

在 mongo 专家的支持下找到了一个非常好的解决方案：

[{
    //
    // Group the series
    //
    '$group': {
        '_id': {
            'seriesName': '$series.seriesName'
        }, 
        'dataPoints': {
            '$push': '$dataPoints'
        }, 
        'series': {
            '$addToSet': '$series'
        }
    }
}, 
{
    //
    // Concat the dataPoints for each series into on array
    //
    '$addFields': {
        'dataPoints': {
            '$reduce': {
                'input': '$dataPoints', 
                'initialValue': [], 
                'in': {
                    '$concatArrays': [
                        '$$value', '$$this'
                    ]
                }
            }
        }
    }
}, 
{
    //
    // Calculate 'x' for 'find every x-th dataPoint' (called index here)
    // 
    '$replaceWith': {
        'dataPoints': {
            '$map': {
                'input': {
                    '$range': [
                        0, {
                            '$size': '$dataPoints'
                        }, {
                            '$ceil': {
                                '$divide': [
                                    {
                                        '$size': '$dataPoints'
                                    }, 100
                                ]
                            }
                        }
                    ]
                }, 
                'as': 'index', 
                'in': {
                    '$arrayElemAt': [
                        '$dataPoints', '$$index'
                    ]
                }
            }
        }
    }
}]

提示：这不会 return 数据点的确切数量，而是近似值。但这正是我需要的...

MongoPlayground

Mongo 聚合：使用 $count 个找到的文档进行查询

Mongo Aggregation: Use $count of found documents for query

dictionary

count

let

mongodb