mongodb 中嵌套文档的聚合
Aggregation in mongodb for nested documents
我有一个格式如下的文档:
"summary":{
"HUL":{
"hr_0":{
"ts":None,
"Insights":{
"sentiments":{
"pos":37,
"neg":3,
"neu":27
},
"topics":[
"Basketball",
"Football"
],
"geo":{
"locations":{
"Delhi":34,
"Kolkata":56,
"Pune":79,
"Bangalore":92,
"Mumbai":54
},
"mst_act":{
"loc":Bangalore,
"lat_long":None
}
}
}
},
"hr_1":{....},
"hr_2":{....},
.
.
"hr_23":{....}
我想 运行 pymongo 中的一个聚合,它总结了一天中所有时间 "hr_0" 到 "hr_23" 的正、负和新情绪。
我在构建管道命令时遇到了问题,因为我感兴趣的字段位于嵌套字典中。非常感谢您的建议。
谢谢!
很难想出一个聚合管道来为您提供所需的聚合,因为您的文档架构有一些动态键,您不能将其用作组运算符管道中的标识表达式。
但是,使用当前架构的解决方法是遍历查找游标并提取要在循环中添加的值。类似于以下内容:
pos_total = 0
neg_total = 0
neu_total = 0
cursor = db.collection.find()
for doc in cursor:
for i in range(0, 24):
pos_total += doc["summary"]["HUL"]["hr_"+str(i)]["Insights"]["sentiments"]["pos"]
neg_total += doc["summary"]["HUL"]["hr_"+str(i)]["Insights"]["sentiments"]["neg"]
neu_total += ddoc["summary"]["HUL"]["hr_"+str(i)]["Insights"]["sentiments"]["neu"]
print(pos_total)
print(neg_total)
print(neu_total)
如果您可以更改架构,那么以下架构将是使用聚合框架的理想选择:
{
"summary": {
"HUL": [
{
"_id": "hr_0",
"ts": None,
"Insights":{
"sentiments":{
"pos":37,
"neg":3,
"neu":27
},
"topics":[
"Basketball",
"Football"
],
"geo":{
"locations":{
"Delhi":34,
"Kolkata":56,
"Pune":79,
"Bangalore":92,
"Mumbai":54
},
"mst_act":{
"loc":Bangalore,
"lat_long":None
}
}
}
},
{
"_id": "hr_2",
"ts": None,
"Insights":{
"sentiments":{
"pos":37,
"neg":3,
"neu":27
},
...
}
},
...
{
"_id": "hr_23",
"ts": None,
"Insights":{
"sentiments":{
"pos":37,
"neg":3,
"neu":27
},
...
}
}
]
}
}
将为您提供所需总数的聚合管道是:
var pipeline = [
{
"$unwind": "$summary.HUL"
},
{
"$group": {
"_id": "$summary.HUL._id",
"pos_total": { "$sum": "$summary.HUL.Insights.sentiments.pos" },
"neg_total": { "$sum": "$summary.HUL.Insights.sentiments.neg" },
"neu_total": { "$sum": "$summary.HUL.Insights.sentiments.neu" },
}
}
]
result = db.collection.aggregate(pipeline)
我有一个格式如下的文档:
"summary":{
"HUL":{
"hr_0":{
"ts":None,
"Insights":{
"sentiments":{
"pos":37,
"neg":3,
"neu":27
},
"topics":[
"Basketball",
"Football"
],
"geo":{
"locations":{
"Delhi":34,
"Kolkata":56,
"Pune":79,
"Bangalore":92,
"Mumbai":54
},
"mst_act":{
"loc":Bangalore,
"lat_long":None
}
}
}
},
"hr_1":{....},
"hr_2":{....},
.
.
"hr_23":{....}
我想 运行 pymongo 中的一个聚合,它总结了一天中所有时间 "hr_0" 到 "hr_23" 的正、负和新情绪。
我在构建管道命令时遇到了问题,因为我感兴趣的字段位于嵌套字典中。非常感谢您的建议。
谢谢!
很难想出一个聚合管道来为您提供所需的聚合,因为您的文档架构有一些动态键,您不能将其用作组运算符管道中的标识表达式。 但是,使用当前架构的解决方法是遍历查找游标并提取要在循环中添加的值。类似于以下内容:
pos_total = 0
neg_total = 0
neu_total = 0
cursor = db.collection.find()
for doc in cursor:
for i in range(0, 24):
pos_total += doc["summary"]["HUL"]["hr_"+str(i)]["Insights"]["sentiments"]["pos"]
neg_total += doc["summary"]["HUL"]["hr_"+str(i)]["Insights"]["sentiments"]["neg"]
neu_total += ddoc["summary"]["HUL"]["hr_"+str(i)]["Insights"]["sentiments"]["neu"]
print(pos_total)
print(neg_total)
print(neu_total)
如果您可以更改架构,那么以下架构将是使用聚合框架的理想选择:
{
"summary": {
"HUL": [
{
"_id": "hr_0",
"ts": None,
"Insights":{
"sentiments":{
"pos":37,
"neg":3,
"neu":27
},
"topics":[
"Basketball",
"Football"
],
"geo":{
"locations":{
"Delhi":34,
"Kolkata":56,
"Pune":79,
"Bangalore":92,
"Mumbai":54
},
"mst_act":{
"loc":Bangalore,
"lat_long":None
}
}
}
},
{
"_id": "hr_2",
"ts": None,
"Insights":{
"sentiments":{
"pos":37,
"neg":3,
"neu":27
},
...
}
},
...
{
"_id": "hr_23",
"ts": None,
"Insights":{
"sentiments":{
"pos":37,
"neg":3,
"neu":27
},
...
}
}
]
}
}
将为您提供所需总数的聚合管道是:
var pipeline = [
{
"$unwind": "$summary.HUL"
},
{
"$group": {
"_id": "$summary.HUL._id",
"pos_total": { "$sum": "$summary.HUL.Insights.sentiments.pos" },
"neg_total": { "$sum": "$summary.HUL.Insights.sentiments.neg" },
"neu_total": { "$sum": "$summary.HUL.Insights.sentiments.neu" },
}
}
]
result = db.collection.aggregate(pipeline)