MongoDB 聚合 $graphLookup - 在以下关系的 collections 中找到共同点 "connections"
MongoDB aggregation $graphLookup - find "connections" in common in collections of following relationships
我有一个collection“谁在关注谁”(比如 Instagram):
db.users.insertMany([
{ _id: 1, name: "Arnold Schwarzenegger" },
{ _id: 2, name: "James Earl Jones" },
{ _id: 3, name: "Harrison Ford" },
{ _id: 4, name: "Jennifer Lawrence" }
]);
db.follows.insertMany([
{ _id: 12, follower: 1, following: 2 },
{ _id: 13, follower: 1, following: 3 },
{ _id: 24, follower: 2, following: 4 },
{ _id: 23, follower: 2, following: 3 }
]);
我正在尝试向其他用户推荐他们可以关注的其他用户。即他们可以关注哪些其他人;建议的关注者,按现有公共连接数排序。
在这个例子中:
+--------+--------------+----------+
| A | is following | B |
+--------+--------------+----------+
| Arnold | -> | James |
| Arnold | -> | Harrison |
| James | -> | Jennifer |
| James | -> | Harrison |
+--------+--------------+----------+
阿诺和詹姆斯之间,阿诺能跟谁?(不包括已有人脉)
The answer should be: Jennifer
这是一次糟糕的尝试:
db.users.aggregate([
{
$match: { _id: 1 } // Arnold
},
{
$graphLookup: {
from: "follows",
startWith: "$_id",
connectFromField: "following",
connectToField: "follower",
maxDepth: 1,
as: "connections",
}
}
]);
这导致:
{
"_id": 1,
"name": "Arnold Schwarzenegger",
"connections": [
{
"_id": 24,
"follower": 2,
"following": 4
},
{
"_id": 13,
"follower": 1,
"following": 3
},
{
"_id": 23,
"follower": 2,
"following": 3
},
{
"_id": 12,
"follower": 1,
"following": 2
}
]
}
我认为我需要做一些 $unwind'ing,但我现在有点卡住了
这里有两种可能的方法。 (我没有用更大的数据集进行测试,所以你的里程可能会有所不同!)
第一个基于您的 $graphLookup
阶段:
db.users.aggregate([
{ $match: { _id: 1 }},
{ $graphLookup: {
from: 'follows',
startWith: '$_id',
connectFromField: 'following',
connectToField: 'follower',
maxDepth: 1,
as: 'connections'
}},
{ $unwind: { path: '$connections' }},
{ $group: {
_id: '$connections.follower',
follows: {
$addToSet: '$connections.following'
}
}},
{ $unwind: { path: '$follows' }},
{ $group: {
_id: '$follows',
isFollowedBy: {
$addToSet: '$_id'
}
}},
{ $match: { isFollowedBy: { $not: { $in: [1] }} }},
{ $group: {
_id: null,
newConnections: {
$addToSet: '$_id'
}
}},
{ $project: { _id: 0 }}
])
请注意,此管道最终会在中途建立与另一个集合的关系,因此另一种方法是从另一个集合开始,如下所示:
db.follows.aggregate([
{ $lookup: {
from: 'follows',
localField: 'following',
foreignField: 'follower',
as: 'potentialSet'
}},
{ $unwind: {
path: "$potentialSet",
preserveNullAndEmptyArrays: true
}},
{ $group: {
_id: "$follower",
"alreadyFollowing": {
$addToSet: "$following"
},
"potentialConnections": {
"$addToSet": "$potentialSet.following"
}
}},
{ $project: {
newConnections: { $setDifference: [ "$potentialConnections", "$alreadyFollowing" ] }
}},
{ $match: { _id: 1 }},
{ $project: { _id: 0 }}
])
如果有帮助,我使用 MongoDB Compass Community Edition 来帮助构建这些管道。它非常酷,因为它允许您快速迭代并查看每个阶段的输出,这在您尝试调试管道时非常有用。
我有一个collection“谁在关注谁”(比如 Instagram):
db.users.insertMany([
{ _id: 1, name: "Arnold Schwarzenegger" },
{ _id: 2, name: "James Earl Jones" },
{ _id: 3, name: "Harrison Ford" },
{ _id: 4, name: "Jennifer Lawrence" }
]);
db.follows.insertMany([
{ _id: 12, follower: 1, following: 2 },
{ _id: 13, follower: 1, following: 3 },
{ _id: 24, follower: 2, following: 4 },
{ _id: 23, follower: 2, following: 3 }
]);
我正在尝试向其他用户推荐他们可以关注的其他用户。即他们可以关注哪些其他人;建议的关注者,按现有公共连接数排序。
在这个例子中:
+--------+--------------+----------+
| A | is following | B |
+--------+--------------+----------+
| Arnold | -> | James |
| Arnold | -> | Harrison |
| James | -> | Jennifer |
| James | -> | Harrison |
+--------+--------------+----------+
阿诺和詹姆斯之间,阿诺能跟谁?(不包括已有人脉)
The answer should be: Jennifer
这是一次糟糕的尝试:
db.users.aggregate([
{
$match: { _id: 1 } // Arnold
},
{
$graphLookup: {
from: "follows",
startWith: "$_id",
connectFromField: "following",
connectToField: "follower",
maxDepth: 1,
as: "connections",
}
}
]);
这导致:
{
"_id": 1,
"name": "Arnold Schwarzenegger",
"connections": [
{
"_id": 24,
"follower": 2,
"following": 4
},
{
"_id": 13,
"follower": 1,
"following": 3
},
{
"_id": 23,
"follower": 2,
"following": 3
},
{
"_id": 12,
"follower": 1,
"following": 2
}
]
}
我认为我需要做一些 $unwind'ing,但我现在有点卡住了
这里有两种可能的方法。 (我没有用更大的数据集进行测试,所以你的里程可能会有所不同!)
第一个基于您的 $graphLookup
阶段:
db.users.aggregate([
{ $match: { _id: 1 }},
{ $graphLookup: {
from: 'follows',
startWith: '$_id',
connectFromField: 'following',
connectToField: 'follower',
maxDepth: 1,
as: 'connections'
}},
{ $unwind: { path: '$connections' }},
{ $group: {
_id: '$connections.follower',
follows: {
$addToSet: '$connections.following'
}
}},
{ $unwind: { path: '$follows' }},
{ $group: {
_id: '$follows',
isFollowedBy: {
$addToSet: '$_id'
}
}},
{ $match: { isFollowedBy: { $not: { $in: [1] }} }},
{ $group: {
_id: null,
newConnections: {
$addToSet: '$_id'
}
}},
{ $project: { _id: 0 }}
])
请注意,此管道最终会在中途建立与另一个集合的关系,因此另一种方法是从另一个集合开始,如下所示:
db.follows.aggregate([
{ $lookup: {
from: 'follows',
localField: 'following',
foreignField: 'follower',
as: 'potentialSet'
}},
{ $unwind: {
path: "$potentialSet",
preserveNullAndEmptyArrays: true
}},
{ $group: {
_id: "$follower",
"alreadyFollowing": {
$addToSet: "$following"
},
"potentialConnections": {
"$addToSet": "$potentialSet.following"
}
}},
{ $project: {
newConnections: { $setDifference: [ "$potentialConnections", "$alreadyFollowing" ] }
}},
{ $match: { _id: 1 }},
{ $project: { _id: 0 }}
])
如果有帮助,我使用 MongoDB Compass Community Edition 来帮助构建这些管道。它非常酷,因为它允许您快速迭代并查看每个阶段的输出,这在您尝试调试管道时非常有用。