mongodb 中的乘法和分组
multiplication and group by in mongodb
我在 mongodb 中有一个集合如下:
{
"_id" : ObjectId("54901212f315dce7077204af"),
"Date" : ISODate("2014-10-20T04:00:00.000Z"),
"Type" : "Twitter",
"Entities" : [
{
"ID" : 2,
"Name" : "test1",
"Sentiment" : {
"Value" : 20,
"Neutral" : 1
},
{
"ID" : 1,
"Name" : "test1",
"Sentiment" : {
"Value" : 1,
"Neutral" : 1
}
},
{
"ID" : 3,
"Name" : "test1",
"Sentiment" : {
"Value" : 2,
"Neutral" : 1
}
]
}
我有几个,例如在日期 2014-10-20 你可能会发现 5 条推文,每条推文都有不同的情绪价值,现在我想做的是按日期分组,然后得到每个日期的情绪值总和乘以每个日期的集合数量,例如,如果我们在 2014 年 10 月 20 日有 2 个集合,情绪值为 20、1、2,如上面显示的集合,而另一个只有 5 collection 那么 2014-10-20 的值为 (20+1+2+5)3(因为这条推文重复了 3 个实体) 2(因为我们有 2 个推文文档这个日期)=168,如果我不考虑收集频率,我的代码运行良好,如下所示:
DBObject unwind = new BasicDBObject("$unwind", "$Entities"); // "$unwind" converts object with array into many duplicate objects, each with one from array
collectionG = db.getCollection("GraphDataCollection");
DBObject groupFields = new BasicDBObject( "_id", "$Date");
groupFields.put("value", new BasicDBObject( "$sum", "$Entities.Sentiment.Value"));
DBObject groupBy = new BasicDBObject("$group", groupFields );
DBObject sort = new BasicDBObject("$sort", new BasicDBObject("Date", 1));
stages.add(unwind);
stages.add(groupBy);
DBObject project = new BasicDBObject("_id",0);
project.put("Date","$_id");
project.put("value",1);
stages.add(new BasicDBObject("$project",project));
stages.add(sort);
AggregationOutput output = collectionG.aggregate(stages);
现在的结果例如 2014-10-20 returns 28 但我想要 168
谁能帮我 ?
更新:我使用的最新版本代码如下:
DBCollection collectionG;
collectionG = db.getCollection("GraphDataCollection");
List<DBObject> stages = new ArrayList<DBObject>();
ArrayList<DBObject> andArray = null;
DBObject groupFields = new BasicDBObject( "_id", "$_id");
groupFields.put("value", new BasicDBObject( "$sum", "$Entities.Sentiment.Value"));
groupFields.put("date", new BasicDBObject( "$first", "$Date"));
DBObject groupBy = new BasicDBObject("$group", groupFields );
stages.add(groupBy);
DBObject groupByDate = new BasicDBObject( "_id", "$date");
groupByDate.put("value",new BasicDBObject("$sum","$value"));
groupByDate.put("count",new BasicDBObject("$sum",1));
DBObject dtGrp = new BasicDBObject("$group", groupByDate );
stages.add(dtGrp);
DBObject project = new BasicDBObject("_id",1);
project.put("value",new BasicDBObject("$multiply",
new Object[]{"$value","$count"}));
stages.add(new BasicDBObject("$project",project));
AggregationOutput output = collectionG.aggregate(stages);
System.out.println(output.results());
Unwind
个实体:
DBObject unwind = new BasicDBObject("$unwind", "$Entities");
stages.add(unwind);
Group
通过 _id
找到所有实体情绪值的总和 每个文档.
DBObject groupFields = new BasicDBObject( "_id", "$_id");
groupFields.put("value", new BasicDBObject( "$sum", "$Entities.Sentiment.Value"));
groupFields.put("date", new BasicDBObject( "$first", "$Date"));
DBObject groupBy = new BasicDBObject("$group", groupFields );
stages.add(groupBy);
Group
Date
现在,得到总实体值的总和,以及每组 文档的 count
。
DBObject groupByDate = new BasicDBObject( "_id", "$date");
groupByDate.put("value",new BasicDBObject("$sum","$value"));
groupByDate.put("count",new BasicDBObject("$sum",1));
DBObject dtGrp = new BasicDBObject("$group", groupByDate );
stages.add(dtGrp);
Project
值作为每个组的 count
和 value
的乘法结果。
DBObject project = new BasicDBObject("_id",1);
project.put("value",new BasicDBObject("$multiply",
new Object[]{"$value","$count"}));
stages.add(new BasicDBObject("$project",project));
如果您的日期相差毫秒,您需要在第二个分组阶段同时按日期、年份和月份进行分组,并在必要时添加排序阶段。
我在 mongodb 中有一个集合如下:
{
"_id" : ObjectId("54901212f315dce7077204af"),
"Date" : ISODate("2014-10-20T04:00:00.000Z"),
"Type" : "Twitter",
"Entities" : [
{
"ID" : 2,
"Name" : "test1",
"Sentiment" : {
"Value" : 20,
"Neutral" : 1
},
{
"ID" : 1,
"Name" : "test1",
"Sentiment" : {
"Value" : 1,
"Neutral" : 1
}
},
{
"ID" : 3,
"Name" : "test1",
"Sentiment" : {
"Value" : 2,
"Neutral" : 1
}
]
}
我有几个,例如在日期 2014-10-20 你可能会发现 5 条推文,每条推文都有不同的情绪价值,现在我想做的是按日期分组,然后得到每个日期的情绪值总和乘以每个日期的集合数量,例如,如果我们在 2014 年 10 月 20 日有 2 个集合,情绪值为 20、1、2,如上面显示的集合,而另一个只有 5 collection 那么 2014-10-20 的值为 (20+1+2+5)3(因为这条推文重复了 3 个实体) 2(因为我们有 2 个推文文档这个日期)=168,如果我不考虑收集频率,我的代码运行良好,如下所示:
DBObject unwind = new BasicDBObject("$unwind", "$Entities"); // "$unwind" converts object with array into many duplicate objects, each with one from array
collectionG = db.getCollection("GraphDataCollection");
DBObject groupFields = new BasicDBObject( "_id", "$Date");
groupFields.put("value", new BasicDBObject( "$sum", "$Entities.Sentiment.Value"));
DBObject groupBy = new BasicDBObject("$group", groupFields );
DBObject sort = new BasicDBObject("$sort", new BasicDBObject("Date", 1));
stages.add(unwind);
stages.add(groupBy);
DBObject project = new BasicDBObject("_id",0);
project.put("Date","$_id");
project.put("value",1);
stages.add(new BasicDBObject("$project",project));
stages.add(sort);
AggregationOutput output = collectionG.aggregate(stages);
现在的结果例如 2014-10-20 returns 28 但我想要 168 谁能帮我 ?
更新:我使用的最新版本代码如下:
DBCollection collectionG;
collectionG = db.getCollection("GraphDataCollection");
List<DBObject> stages = new ArrayList<DBObject>();
ArrayList<DBObject> andArray = null;
DBObject groupFields = new BasicDBObject( "_id", "$_id");
groupFields.put("value", new BasicDBObject( "$sum", "$Entities.Sentiment.Value"));
groupFields.put("date", new BasicDBObject( "$first", "$Date"));
DBObject groupBy = new BasicDBObject("$group", groupFields );
stages.add(groupBy);
DBObject groupByDate = new BasicDBObject( "_id", "$date");
groupByDate.put("value",new BasicDBObject("$sum","$value"));
groupByDate.put("count",new BasicDBObject("$sum",1));
DBObject dtGrp = new BasicDBObject("$group", groupByDate );
stages.add(dtGrp);
DBObject project = new BasicDBObject("_id",1);
project.put("value",new BasicDBObject("$multiply",
new Object[]{"$value","$count"}));
stages.add(new BasicDBObject("$project",project));
AggregationOutput output = collectionG.aggregate(stages);
System.out.println(output.results());
Unwind
个实体:
DBObject unwind = new BasicDBObject("$unwind", "$Entities");
stages.add(unwind);
Group
通过 _id
找到所有实体情绪值的总和 每个文档.
DBObject groupFields = new BasicDBObject( "_id", "$_id");
groupFields.put("value", new BasicDBObject( "$sum", "$Entities.Sentiment.Value"));
groupFields.put("date", new BasicDBObject( "$first", "$Date"));
DBObject groupBy = new BasicDBObject("$group", groupFields );
stages.add(groupBy);
Group
Date
现在,得到总实体值的总和,以及每组 文档的 count
。
DBObject groupByDate = new BasicDBObject( "_id", "$date");
groupByDate.put("value",new BasicDBObject("$sum","$value"));
groupByDate.put("count",new BasicDBObject("$sum",1));
DBObject dtGrp = new BasicDBObject("$group", groupByDate );
stages.add(dtGrp);
Project
值作为每个组的 count
和 value
的乘法结果。
DBObject project = new BasicDBObject("_id",1);
project.put("value",new BasicDBObject("$multiply",
new Object[]{"$value","$count"}));
stages.add(new BasicDBObject("$project",project));
如果您的日期相差毫秒,您需要在第二个分组阶段同时按日期、年份和月份进行分组,并在必要时添加排序阶段。