映射减少以计算唯一计数
Map reduce to count the unique count
我想要一个 map reduce 函数从满足以下条件的以下输入集合中绘制以下输出。
输入集合:
[{
a:1,
b:'test',
indices:[1,2,4,5]
}, {
a:2,
b:'test',
indices:[2, 3, 5]
}, {
a:2,
b:'test',
indices:[1, 2, 4]
}, {
a:3,
b:'apple',
indices:[1, 2]
}, {
a:4,
b:'apple',
indices:[1, 3, 5]
}, {
a:5,
b:'orange',
indices:[232]
}, {
a:5,
b:'dummy',
indices:[2]
}, {
a:6,
b:'dummy',
indices:[11, 2, 4]
}, {
a:6,
b:'dummy',
indices:[11, 3, 2]
}, {
a:6,
b:'dummy',
indices:[1, 2, 3, 4, 5]
}]
条件是:
- select只有索引数组有
2
。这可以发送为
询问。即,查询:{indices:{$in:2}}
- 分组依据
b
- 如果有重复的
a
,那么应该认为是1
eg: document having a=2 are present in two times satisfied the condition indices
有 2.
- 我的输入集合总是满足 if a 的条件
"test" 中的礼物,
dummy/apple/etc
中不会出现。但是一个
可以重复。
这是我尝试过的:
db.x.mapReduce(function(){
emit(this.b, 1);
}, function(key, reducable){
return Array.sum(reducable);
}, {
out: {inline: 1},
query:{
'indices':{$in:2}
}
});
输出:
[
{
"_id" : test",
"value" : {
"count" : 3 -> It should be 2
}
},{
"_id" : apple",
"value" : {
"count" : 2
}
},{
"_id" : dummy",
"value" : {
"count" : 4 -> It should be 2
}
}]
预期输出:
[{
"_id" : test",
"value" : {
"count" : 2
}
},{
"_id" : apple",
"value" : {
"count" : 2
}
},{
"_id" : dummy",
"value" : {
"count" : 2
}
}]
不需要map/reduce。使用聚合:
> db.crawler_status.aggregate([
{ "$match" : { "indices" : 2 } },
{ "$group" : { "_id" : { "b" : "$b", "a" : "$a" } } },
{ "$group" : { "_id" : "$_id.b", "count" : { "$sum" : 1 } } }
])
{ "_id" : "test", "count" : 2 }
{ "_id" : "apple", "count" : 1 } // your sample output was mistaken
{ "_id" : "dummy", "count" : 2 }
我想要一个 map reduce 函数从满足以下条件的以下输入集合中绘制以下输出。
输入集合:
[{
a:1,
b:'test',
indices:[1,2,4,5]
}, {
a:2,
b:'test',
indices:[2, 3, 5]
}, {
a:2,
b:'test',
indices:[1, 2, 4]
}, {
a:3,
b:'apple',
indices:[1, 2]
}, {
a:4,
b:'apple',
indices:[1, 3, 5]
}, {
a:5,
b:'orange',
indices:[232]
}, {
a:5,
b:'dummy',
indices:[2]
}, {
a:6,
b:'dummy',
indices:[11, 2, 4]
}, {
a:6,
b:'dummy',
indices:[11, 3, 2]
}, {
a:6,
b:'dummy',
indices:[1, 2, 3, 4, 5]
}]
条件是:
- select只有索引数组有
2
。这可以发送为 询问。即,查询:{indices:{$in:2}} - 分组依据
b
- 如果有重复的
a
,那么应该认为是1
eg: document having a=2 are present in two times satisfied the condition indices 有 2. - 我的输入集合总是满足 if a 的条件
"test" 中的礼物,
dummy/apple/etc
中不会出现。但是一个 可以重复。
这是我尝试过的:
db.x.mapReduce(function(){
emit(this.b, 1);
}, function(key, reducable){
return Array.sum(reducable);
}, {
out: {inline: 1},
query:{
'indices':{$in:2}
}
});
输出: [
{
"_id" : test",
"value" : {
"count" : 3 -> It should be 2
}
},{
"_id" : apple",
"value" : {
"count" : 2
}
},{
"_id" : dummy",
"value" : {
"count" : 4 -> It should be 2
}
}]
预期输出:
[{
"_id" : test",
"value" : {
"count" : 2
}
},{
"_id" : apple",
"value" : {
"count" : 2
}
},{
"_id" : dummy",
"value" : {
"count" : 2
}
}]
不需要map/reduce。使用聚合:
> db.crawler_status.aggregate([
{ "$match" : { "indices" : 2 } },
{ "$group" : { "_id" : { "b" : "$b", "a" : "$a" } } },
{ "$group" : { "_id" : "$_id.b", "count" : { "$sum" : 1 } } }
])
{ "_id" : "test", "count" : 2 }
{ "_id" : "apple", "count" : 1 } // your sample output was mistaken
{ "_id" : "dummy", "count" : 2 }