Mongodb 查询具有多个值计数的聚合
Mongodb querying for aggregation with count of multiple values
我在我的一个 rails 应用程序中使用 Mongoid 来 mongodb
class Tracking
include Mongoid::Document
include Mongoid::Timestamps
field :article_id, type: String
field :action, type: String # like | comment
field :actor_gender, type: String # male | female | unknown
field :city, type: String
field :state, type: String
field :country, type: String
end
这里我要抓取这种表格格式的记录,
article_id | state | male_like_count | female_like_count | unknown_gender_like_count | date
juhkwu2367 | California | 21 | 7 | 1 | 11-20-2015
juhkwu2367 | New York | 62 | 23 | 3 | 11-20-2015
juhkwu2367 | Vermont | 48 | 27 | 3 | 11-20-2015
juhkwu2367 | California | 21 | 7 | 1 | 11-21-2015
juhkwu2367 | New York | 62 | 23 | 3 | 11-21-2015
juhkwu2367 | Vermont | 48 | 27 | 3 | 11-21-2015
此处的查询输入为:
article_id
country
date range (from and to)
action (is `like` in this scenario)
sort_by [ date | state | male_like_count | female_like_count ]
这就是我正在尝试的,参考 https://docs.mongodb.org/v3.0/reference/operator/aggregation/group/
中的示例
db.trackings.aggregate(
[
{
$group : {
_id : { month: { $month: "$created_at" }, day: { $dayOfMonth: "$created_at" }, year: { $year: "$created_at" }, article_id: "$article_id", state: "$state", country: "$country"},
article_id: "$article_id",
country: ??,
state: "$state",
male_like_count: { $sum: ?? } },
female_like_count: { $sum: ?? } },
unknown_gender_like_count: { $sum: ?? } },
date: ??
}
}
]
)
所以我应该在 ??
的地方放什么来比较性别计数以及如何为 sorting_option
添加子句?
您主要是在寻找 $cond
运算符来评估条件和 return 特定计数器是否应该递增,但您还缺少一些其他聚合概念这里:
db.trackings.aggregate([
{ "$match": {
"created_at": { "$gte": startDate, "$lt": endDate },
"country": "US",
"action": "like"
}},
{ "$group": {
"_id": {
"date": {
"month": { "$month": "$created_at" },
"day": { "$dayOfMonth": "$created_at" },
"year": { "$year": "$created_at" }
},
"article_id": "$article_id",
"state": "$state"
},
"male_like_count": {
"$sum": {
"$cond": [
{ "$eq": [ "$gender", "male" ] }
1,
0
]
}
},
"female_like_count": {
"$sum": {
"$cond": [
{ "$eq": [ "$gender", "female" ] }
1,
0
]
}
},
"unknown_like_count": {
"$sum": {
"$cond": [
{ "$eq": [ "$gender", "unknown" ] }
1,
0
]
}
}
}},
{ "$sort": {
"_id.date.year": 1,
"_id.date.month": 1,
"_id.date.day": 1,
"_id.article_id": 1,
"_id.state": 1,
"male_like_count": 1,
"female_like_count": 1
}}
]
)
首先,您基本上想要 $match
,这就是您为聚合管道提供 "query" 条件的方式。它基本上可以是任何流水线阶段,但首先使用时,它将过滤在后续操作中考虑的输入。在这种情况下,需要日期范围和国家/地区,并删除任何不是 "like" 的内容,因为您不担心这些计数。
然后所有项目按 _id
中的相应 "key" 分组。这可以并且用作复合字段,主要是因为所有这些字段值都被视为分组键的一部分,并且也用于一些组织。
您似乎还在 _id
本身之外的输出中询问 "distinct fields"。不要那样做。数据已经存在,因此没有必要复制它。您可以通过管道末端的 $first
as an aggregation operator, or you could even use a $project
阶段在 _id
之外生成相同的内容以重命名字段。但是,最好不要养成您认为自己需要的习惯,因为这只会花费时间和/或 space 来获得回应。
如果有的话,你似乎比其他任何人都更想 "pretty date"。对于大多数操作,我个人更喜欢使用 "date math",因此适合 mongoid 的更改列表为:
Tracking.collection.aggregate([
{ "$match" => {
"created_at" => { "$gte" => startDate, "$lt" => endDate },
"country" => "US",
"action" => "like"
}},
{ "$group" => {
"_id" => {
"date" => {
"$add" => [
{ "$subtract" => [
{ "$subtract" => [ "$created_at", Time.at(0).utc.to_datetime ] },
{ "$mod" => [
{ "$subtract" => [ "$created_at", Time.at(0).utc.to_datetime ] },
1000 * 60 * 60 * 24
]}
]},
Time.at(0).utc.to_datetime
]
},
"article_id" => "$article_id",
"state" => "$state"
},
"male_like_count" => {
"$sum" => {
"$cond" => [
{ "$eq" => [ "$gender", "male" ] }
1,
0
]
}
},
"female_like_count" => {
"$sum" => {
"$cond" => [
{ "$eq" => [ "$gender", "female" ] }
1,
0
]
}
},
"unknown_like_count" => {
"$sum" => {
"$cond" => [
{ "$eq" =>[ "$gender", "unknown" ] }
1,
0
]
}
}
}},
{ "$sort" => {
"_id.date" => 1,
"_id.article_id" => 1,
"_id.state" => 1,
"male_like_count" => 1,
"female_like_count" => 1
}}
])
这实际上归结为获得一个适合用作驱动程序参数的 DateTime
对象,该对象对应于纪元日期并进行各种操作。如果使用数字时间戳值处理 $subtract
with one BSON Date and another will produce a numeric value that can be subsequently be rounded to the current day using the applied math. Then of course when using $add
到 BSON 日期(再次代表纪元),那么结果又是一个 BSON 日期对象,当然具有调整和四舍五入的值。
那么这只是再次应用 $sort
作为聚合管道阶段的问题,而不是外部修饰符。很像 $match
原则,聚合管道可以在任何地方排序,但最后总是处理最终结果。
我在我的一个 rails 应用程序中使用 Mongoid 来 mongodb
class Tracking
include Mongoid::Document
include Mongoid::Timestamps
field :article_id, type: String
field :action, type: String # like | comment
field :actor_gender, type: String # male | female | unknown
field :city, type: String
field :state, type: String
field :country, type: String
end
这里我要抓取这种表格格式的记录,
article_id | state | male_like_count | female_like_count | unknown_gender_like_count | date
juhkwu2367 | California | 21 | 7 | 1 | 11-20-2015
juhkwu2367 | New York | 62 | 23 | 3 | 11-20-2015
juhkwu2367 | Vermont | 48 | 27 | 3 | 11-20-2015
juhkwu2367 | California | 21 | 7 | 1 | 11-21-2015
juhkwu2367 | New York | 62 | 23 | 3 | 11-21-2015
juhkwu2367 | Vermont | 48 | 27 | 3 | 11-21-2015
此处的查询输入为:
article_id
country
date range (from and to)
action (is `like` in this scenario)
sort_by [ date | state | male_like_count | female_like_count ]
这就是我正在尝试的,参考 https://docs.mongodb.org/v3.0/reference/operator/aggregation/group/
中的示例db.trackings.aggregate(
[
{
$group : {
_id : { month: { $month: "$created_at" }, day: { $dayOfMonth: "$created_at" }, year: { $year: "$created_at" }, article_id: "$article_id", state: "$state", country: "$country"},
article_id: "$article_id",
country: ??,
state: "$state",
male_like_count: { $sum: ?? } },
female_like_count: { $sum: ?? } },
unknown_gender_like_count: { $sum: ?? } },
date: ??
}
}
]
)
所以我应该在 ??
的地方放什么来比较性别计数以及如何为 sorting_option
添加子句?
您主要是在寻找 $cond
运算符来评估条件和 return 特定计数器是否应该递增,但您还缺少一些其他聚合概念这里:
db.trackings.aggregate([
{ "$match": {
"created_at": { "$gte": startDate, "$lt": endDate },
"country": "US",
"action": "like"
}},
{ "$group": {
"_id": {
"date": {
"month": { "$month": "$created_at" },
"day": { "$dayOfMonth": "$created_at" },
"year": { "$year": "$created_at" }
},
"article_id": "$article_id",
"state": "$state"
},
"male_like_count": {
"$sum": {
"$cond": [
{ "$eq": [ "$gender", "male" ] }
1,
0
]
}
},
"female_like_count": {
"$sum": {
"$cond": [
{ "$eq": [ "$gender", "female" ] }
1,
0
]
}
},
"unknown_like_count": {
"$sum": {
"$cond": [
{ "$eq": [ "$gender", "unknown" ] }
1,
0
]
}
}
}},
{ "$sort": {
"_id.date.year": 1,
"_id.date.month": 1,
"_id.date.day": 1,
"_id.article_id": 1,
"_id.state": 1,
"male_like_count": 1,
"female_like_count": 1
}}
]
)
首先,您基本上想要 $match
,这就是您为聚合管道提供 "query" 条件的方式。它基本上可以是任何流水线阶段,但首先使用时,它将过滤在后续操作中考虑的输入。在这种情况下,需要日期范围和国家/地区,并删除任何不是 "like" 的内容,因为您不担心这些计数。
然后所有项目按 _id
中的相应 "key" 分组。这可以并且用作复合字段,主要是因为所有这些字段值都被视为分组键的一部分,并且也用于一些组织。
您似乎还在 _id
本身之外的输出中询问 "distinct fields"。不要那样做。数据已经存在,因此没有必要复制它。您可以通过管道末端的 $first
as an aggregation operator, or you could even use a $project
阶段在 _id
之外生成相同的内容以重命名字段。但是,最好不要养成您认为自己需要的习惯,因为这只会花费时间和/或 space 来获得回应。
如果有的话,你似乎比其他任何人都更想 "pretty date"。对于大多数操作,我个人更喜欢使用 "date math",因此适合 mongoid 的更改列表为:
Tracking.collection.aggregate([
{ "$match" => {
"created_at" => { "$gte" => startDate, "$lt" => endDate },
"country" => "US",
"action" => "like"
}},
{ "$group" => {
"_id" => {
"date" => {
"$add" => [
{ "$subtract" => [
{ "$subtract" => [ "$created_at", Time.at(0).utc.to_datetime ] },
{ "$mod" => [
{ "$subtract" => [ "$created_at", Time.at(0).utc.to_datetime ] },
1000 * 60 * 60 * 24
]}
]},
Time.at(0).utc.to_datetime
]
},
"article_id" => "$article_id",
"state" => "$state"
},
"male_like_count" => {
"$sum" => {
"$cond" => [
{ "$eq" => [ "$gender", "male" ] }
1,
0
]
}
},
"female_like_count" => {
"$sum" => {
"$cond" => [
{ "$eq" => [ "$gender", "female" ] }
1,
0
]
}
},
"unknown_like_count" => {
"$sum" => {
"$cond" => [
{ "$eq" =>[ "$gender", "unknown" ] }
1,
0
]
}
}
}},
{ "$sort" => {
"_id.date" => 1,
"_id.article_id" => 1,
"_id.state" => 1,
"male_like_count" => 1,
"female_like_count" => 1
}}
])
这实际上归结为获得一个适合用作驱动程序参数的 DateTime
对象,该对象对应于纪元日期并进行各种操作。如果使用数字时间戳值处理 $subtract
with one BSON Date and another will produce a numeric value that can be subsequently be rounded to the current day using the applied math. Then of course when using $add
到 BSON 日期(再次代表纪元),那么结果又是一个 BSON 日期对象,当然具有调整和四舍五入的值。
那么这只是再次应用 $sort
作为聚合管道阶段的问题,而不是外部修饰符。很像 $match
原则,聚合管道可以在任何地方排序,但最后总是处理最终结果。