Mongodb 打印多个字段中唯一值的计数
Mongodb print count of unique values from multiple fields
我得到了一个集合的以下文档(让我们将其命名为myCollection
):
{
"_id": {
"$oid": "601a75a0c9a338f09f238816"
},
"Sample": "lie50",
"Chromosome": "chr10",
"Position": {
"$numberLong": "47663"
},
"Reference": "C",
"Mutation": "T",
"Run": "Run_test",
"SYMBOL": "TUBB8"
},
{
"_id": {
"$oid": "601a75a0c9a338f09f238817"
},
"Sample": "lie50",
"Chromosome": "chr10",
"Position": {
"$numberLong": "47876"
},
"Reference": "T",
"Mutation": "C",
"Run": "Run_test",
"SYMBOL": "TUBB8"
},
{
"_id": {
"$oid": "601a75a0c9a338f09f238818"
},
"Sample": "lie50",
"Chromosome": "chr10",
"Position": {
"$numberLong": "48005"
},
"Reference": "G",
"Mutation": "A",
"Run": "Run_test",
"SYMBOL": "TUBB8"
},
{
"_id": {
"$oid": "601a75a0c9a338f09f238819"
},
"Sample": "lie12",
"Chromosome": "chr10",
"Position": {
"$numberLong": "48005"
},
"Reference": "G",
"Mutation": "A",
"Run": "Run_test",
"SYMBOL": "TUBB8"
}
我有兴趣打印字段 Chromosome
、Position
、Reference
和 Mutation
中值的不同计数。这意味着计算以下条目的唯一字段:
"Chromosome": "chr10", "Position": 47663, "Reference": "C", "Mutation": "T"
"Chromosome": "chr10", "Position": 47876, "Reference": "T", "Mutation": "C"
"Chromosome": "chr10", "Position": 48005, "Reference": "G", "Mutation": "A"
"Chromosome": "chr10", "Position": 48005, "Reference": "G", "Mutation": "A"
这应该是 3
个不同的行。
我已经检查了多个这样的问题one on how to print the distinct values for one field or using $unwind/$project。
对于后者,我想为什么不连接 4 个字段然后使用 length
和 $unwind/$project
打印数字?
我成功做到了:
db.myCollection.aggregate(
[
{
$group:
{
_id: null,
newfield: {
$addToSet:
{
$concat:
[
"$Chromosome",
"_",
{"$toString":"$Position"},
"_",
"$Reference",
"_",
"$Mutation"
]
}
}
}
},
{
$unwind: "$newfield"
},
{
$project: { _id: 0 }
}
]).length
但是,将 .length
添加到此查询中不会 return 除了没有 returns:
{ "newfield" : "chr10_47663_C_T" }
{ "newfield" : "chr10_47876_T_C" }
{ "newfield" : "chr10_48005_G_A" }
供参考,我的实际数据包含20亿个文档。
字段应该在$group
阶段传入_id
,并且还使用$count
阶段获取总元素而不是返回所有文档,
db.myCollection.aggregate([
{
$group: {
_id: {
Chromosome: "$Chromosome",
Position: "$Position",
Reference: "$Reference",
Mutation: "$Mutation"
}
}
},
{ $count: "count" }
])
我得到了一个集合的以下文档(让我们将其命名为myCollection
):
{
"_id": {
"$oid": "601a75a0c9a338f09f238816"
},
"Sample": "lie50",
"Chromosome": "chr10",
"Position": {
"$numberLong": "47663"
},
"Reference": "C",
"Mutation": "T",
"Run": "Run_test",
"SYMBOL": "TUBB8"
},
{
"_id": {
"$oid": "601a75a0c9a338f09f238817"
},
"Sample": "lie50",
"Chromosome": "chr10",
"Position": {
"$numberLong": "47876"
},
"Reference": "T",
"Mutation": "C",
"Run": "Run_test",
"SYMBOL": "TUBB8"
},
{
"_id": {
"$oid": "601a75a0c9a338f09f238818"
},
"Sample": "lie50",
"Chromosome": "chr10",
"Position": {
"$numberLong": "48005"
},
"Reference": "G",
"Mutation": "A",
"Run": "Run_test",
"SYMBOL": "TUBB8"
},
{
"_id": {
"$oid": "601a75a0c9a338f09f238819"
},
"Sample": "lie12",
"Chromosome": "chr10",
"Position": {
"$numberLong": "48005"
},
"Reference": "G",
"Mutation": "A",
"Run": "Run_test",
"SYMBOL": "TUBB8"
}
我有兴趣打印字段 Chromosome
、Position
、Reference
和 Mutation
中值的不同计数。这意味着计算以下条目的唯一字段:
"Chromosome": "chr10", "Position": 47663, "Reference": "C", "Mutation": "T"
"Chromosome": "chr10", "Position": 47876, "Reference": "T", "Mutation": "C"
"Chromosome": "chr10", "Position": 48005, "Reference": "G", "Mutation": "A"
"Chromosome": "chr10", "Position": 48005, "Reference": "G", "Mutation": "A"
这应该是 3
个不同的行。
我已经检查了多个这样的问题one on how to print the distinct values for one field or using $unwind/$project。
对于后者,我想为什么不连接 4 个字段然后使用 length
和 $unwind/$project
打印数字?
我成功做到了:
db.myCollection.aggregate(
[
{
$group:
{
_id: null,
newfield: {
$addToSet:
{
$concat:
[
"$Chromosome",
"_",
{"$toString":"$Position"},
"_",
"$Reference",
"_",
"$Mutation"
]
}
}
}
},
{
$unwind: "$newfield"
},
{
$project: { _id: 0 }
}
]).length
但是,将 .length
添加到此查询中不会 return 除了没有 returns:
{ "newfield" : "chr10_47663_C_T" }
{ "newfield" : "chr10_47876_T_C" }
{ "newfield" : "chr10_48005_G_A" }
供参考,我的实际数据包含20亿个文档。
字段应该在$group
阶段传入_id
,并且还使用$count
阶段获取总元素而不是返回所有文档,
db.myCollection.aggregate([
{
$group: {
_id: {
Chromosome: "$Chromosome",
Position: "$Position",
Reference: "$Reference",
Mutation: "$Mutation"
}
}
},
{ $count: "count" }
])