Mongodb 打印多个字段中唯一值的计数

Mongodb print count of unique values from multiple fields

我得到了一个集合的以下文档(让我们将其命名为myCollection):

{
    "_id": {
        "$oid": "601a75a0c9a338f09f238816"
    },
    "Sample": "lie50",
    "Chromosome": "chr10",
    "Position": {
        "$numberLong": "47663"
    },
    "Reference": "C",
    "Mutation": "T",
    "Run": "Run_test",
    "SYMBOL": "TUBB8"
},
{
    "_id": {
        "$oid": "601a75a0c9a338f09f238817"
    },
    "Sample": "lie50",
    "Chromosome": "chr10",
    "Position": {
        "$numberLong": "47876"
    },
    "Reference": "T",
    "Mutation": "C",
    "Run": "Run_test",
    "SYMBOL": "TUBB8"
},
{
    "_id": {
        "$oid": "601a75a0c9a338f09f238818"
    },
    "Sample": "lie50",
    "Chromosome": "chr10",
    "Position": {
        "$numberLong": "48005"
    },
    "Reference": "G",
    "Mutation": "A",
    "Run": "Run_test",
    "SYMBOL": "TUBB8"
},
{
    "_id": {
        "$oid": "601a75a0c9a338f09f238819"
    },
    "Sample": "lie12",
    "Chromosome": "chr10",
    "Position": {
        "$numberLong": "48005"
    },
    "Reference": "G",
    "Mutation": "A",
    "Run": "Run_test",
    "SYMBOL": "TUBB8"
}

我有兴趣打印字段 ChromosomePositionReferenceMutation 中值的不同计数。这意味着计算以下条目的唯一字段:

"Chromosome": "chr10", "Position": 47663, "Reference": "C", "Mutation": "T"
"Chromosome": "chr10", "Position": 47876, "Reference": "T", "Mutation": "C"
"Chromosome": "chr10", "Position": 48005, "Reference": "G", "Mutation": "A"
"Chromosome": "chr10", "Position": 48005, "Reference": "G", "Mutation": "A"

这应该是 3 个不同的行。

我已经检查了多个这样的问题one on how to print the distinct values for one field or using $unwind/$project

对于后者,我想为什么不连接 4 个字段然后使用 length$unwind/$project 打印数字?

我成功做到了:

db.myCollection.aggregate(
[
  {
    $group:
    {
      _id: null,
      newfield: {
        $addToSet:
        {
          $concat:
          [
            "$Chromosome",
            "_",
            {"$toString":"$Position"},
            "_",
            "$Reference",
            "_",
            "$Mutation"
          ]
        }
      }
    }
  },
  {
    $unwind: "$newfield"
  },
  { 
    $project: { _id: 0 }
  }
]).length

但是,将 .length 添加到此查询中不会 return 除了没有 returns:

{ "newfield" : "chr10_47663_C_T" }
{ "newfield" : "chr10_47876_T_C" }
{ "newfield" : "chr10_48005_G_A" }

供参考,我的实际数据包含20亿个文档。

字段应该在$group阶段传入_id,并且还使用$count阶段获取总元素而不是返回所有文档,

db.myCollection.aggregate([
  {
    $group: {
      _id: {
        Chromosome: "$Chromosome",
        Position: "$Position",
        Reference: "$Reference",
        Mutation: "$Mutation"
      }
    }
  },
  { $count: "count" }
])

Playground