如何比较同一集合中的文档?
How to compare documents in the same collection?
我是 mongo 的新手,因为我以前只使用 Oracle 数据库。
我有一个 mongo 数据库,它在列中存储 bitbucket 数据,如下所示:
_id | _class | collectorItemId| firstEverCommit | scmUrl | scmBranch | scmAuthor | scmCommitTimestamp
为了节省时间,我省略了其中的一些专栏。对于 scmBranch 列,该列填充有两个字符串之一:"master" 或 "develop"。
以下是数据的示例:
这是其中一行的文档视图:
{
"_id" : ObjectId("5e39d6a0330c130006a042c6"),
"collectorItemId" : ObjectId("5e33a6b9887ef5000620a0c0"),
"firstEverCommit" : false,
"scmUrl" : "sampleRepo1",
"scmBranch" : "master",
"scmRevisionNumber" : "a2ad6842468eb55bffcbe7d700b6addd3eb11629",
"scmAuthor" : "son123",
"scmCommitTimestamp" : NumberLong(1580841662000)
}
我现在正在尝试制定 mongo 查询以获得以下数据:
1. For each scmUrl, If max(scmCommitTimestamp) where scmBranch =
"develop" > max(scmCommitTimestamp) where scmBranch = "master" THEN
count the number of rows (i.e commits) where scmBranch = "develop"
AND scmCommitTimestamp > max(scmCommitTimestamp) where scmBranch =
"master"
2. For the results found in #1, find the oldest commit and newest
commit
到目前为止,我能想到的最好的 mongo 查询如下:
db.bitbucket.aggregate([{
"$group": {
"_id": {
"scmUrl": "$scmUrl",
"scmBranch": "$scmBranch"
},
"MostRecentCommit": {
"$max": {"$toDate":"$scmCommitTimestamp"}
}
}
},{
"$project": {
"RepoName": {"$substr": ["$_id.scmUrl",39,-1]},
"Branch": "$_id.scmBranch",
"MostRecentCommit": "$MostRecentCommit"
}
},{
"$sort":{
"RepoName":1,
"Branch":1
}
}
])
但这只会让我返回每个 scmUrl(即回购)的 develop 分支和 master 分支的最新提交,如下所示:
理想情况下,我想返回 table 个包含以下列的结果:
scmUrl/RepoName | Number of commits on develop branch that are not on master branch| oldest commit in develop branch that's not in master branch | newest commit in develop branch that's not in master branch
如何修改我的 mongo 查询以提取我想要的数据?
你可以尝试这样的事情。
下面的查询将从 master 获取每个 repo 的最新提交日期。在您获得最新的提交日期之后,您将重新加入同一个集合以拉取分支正在开发的所有提交,并且每个 repo 的提交都比主分支更新。
db.bitbucket.aggregate([
{"$match":{"scmBranch":"master"}},
{"$group":{"_id":"$scmUrl","recentcommitdate":{"$max":"$scmCommitTimestamp"}}},
{"$lookup":{
"from":"bitbucket",
"let":{"scmUrl":"$_id","recentcommitdate":"$recentcommitdate"},
"pipeline":[
{"$match":{"$expr":
{"$and":[
{"$eq":["$scmBranch","develop"]},
{"$eq":["$scmUrl","$$scmUrl"]},
{"$gte":["$scmCommitTimestamp", "$$recentcommitdate"]}
]}
}},
{"$sort":{"scmCommitTimestamp":-1}}
],
"as":"commits"
}},
{"$match":{"commits.0":{"$exists":true}}},
{"$project":{
"commits":{"$size":"$commits"},
"lastcommit":{"$arrayElemAt":["$commits",0]},
"firstcommit":{"$arrayElemAt":["$commits",-1]}
}}
])
此处添加示例https://mongoplayground.net/p/wLnFY0H_nJz
更新版本 ID
db.bitbucket.aggregate([
{"$match":{"scmBranch":"master"}},
{"$group":{"_id":"$scmUrl","revisionIds":{"$push":"$scmRevisionNumber"}}},
{"$lookup":{
"from":"bitbucket",
"let":{"scmUrl":"$_id","revisionIds":"$revisionIds"},
"pipeline":[
{"$match":{"$expr":
{"$and":[
{"$eq":["$scmBranch","develop"]},
{"$eq":["$scmUrl","$$scmUrl"]},
{"$not":[{"$in":["$scmRevisionNumber","$$revisionIds"]}]}
]}
}},
{"$sort":{"scmCommitTimestamp":-1}}
],
"as":"commits"
}},
{"$match":{"commits.0":{"$exists":true}}},
{"$project":{
"commits":{"$size":"$commits"},
"lastcommit":{"$arrayElemAt":["$commits",0]},
"firstcommit":{"$arrayElemAt":["$commits",-1]}
}}
])
我是 mongo 的新手,因为我以前只使用 Oracle 数据库。 我有一个 mongo 数据库,它在列中存储 bitbucket 数据,如下所示:
_id | _class | collectorItemId| firstEverCommit | scmUrl | scmBranch | scmAuthor | scmCommitTimestamp
为了节省时间,我省略了其中的一些专栏。对于 scmBranch 列,该列填充有两个字符串之一:"master" 或 "develop"。
以下是数据的示例:
这是其中一行的文档视图:
{
"_id" : ObjectId("5e39d6a0330c130006a042c6"),
"collectorItemId" : ObjectId("5e33a6b9887ef5000620a0c0"),
"firstEverCommit" : false,
"scmUrl" : "sampleRepo1",
"scmBranch" : "master",
"scmRevisionNumber" : "a2ad6842468eb55bffcbe7d700b6addd3eb11629",
"scmAuthor" : "son123",
"scmCommitTimestamp" : NumberLong(1580841662000)
}
我现在正在尝试制定 mongo 查询以获得以下数据:
1. For each scmUrl, If max(scmCommitTimestamp) where scmBranch =
"develop" > max(scmCommitTimestamp) where scmBranch = "master" THEN
count the number of rows (i.e commits) where scmBranch = "develop"
AND scmCommitTimestamp > max(scmCommitTimestamp) where scmBranch =
"master"
2. For the results found in #1, find the oldest commit and newest
commit
到目前为止,我能想到的最好的 mongo 查询如下:
db.bitbucket.aggregate([{
"$group": {
"_id": {
"scmUrl": "$scmUrl",
"scmBranch": "$scmBranch"
},
"MostRecentCommit": {
"$max": {"$toDate":"$scmCommitTimestamp"}
}
}
},{
"$project": {
"RepoName": {"$substr": ["$_id.scmUrl",39,-1]},
"Branch": "$_id.scmBranch",
"MostRecentCommit": "$MostRecentCommit"
}
},{
"$sort":{
"RepoName":1,
"Branch":1
}
}
])
但这只会让我返回每个 scmUrl(即回购)的 develop 分支和 master 分支的最新提交,如下所示:
理想情况下,我想返回 table 个包含以下列的结果:
scmUrl/RepoName | Number of commits on develop branch that are not on master branch| oldest commit in develop branch that's not in master branch | newest commit in develop branch that's not in master branch
如何修改我的 mongo 查询以提取我想要的数据?
你可以尝试这样的事情。
下面的查询将从 master 获取每个 repo 的最新提交日期。在您获得最新的提交日期之后,您将重新加入同一个集合以拉取分支正在开发的所有提交,并且每个 repo 的提交都比主分支更新。
db.bitbucket.aggregate([
{"$match":{"scmBranch":"master"}},
{"$group":{"_id":"$scmUrl","recentcommitdate":{"$max":"$scmCommitTimestamp"}}},
{"$lookup":{
"from":"bitbucket",
"let":{"scmUrl":"$_id","recentcommitdate":"$recentcommitdate"},
"pipeline":[
{"$match":{"$expr":
{"$and":[
{"$eq":["$scmBranch","develop"]},
{"$eq":["$scmUrl","$$scmUrl"]},
{"$gte":["$scmCommitTimestamp", "$$recentcommitdate"]}
]}
}},
{"$sort":{"scmCommitTimestamp":-1}}
],
"as":"commits"
}},
{"$match":{"commits.0":{"$exists":true}}},
{"$project":{
"commits":{"$size":"$commits"},
"lastcommit":{"$arrayElemAt":["$commits",0]},
"firstcommit":{"$arrayElemAt":["$commits",-1]}
}}
])
此处添加示例https://mongoplayground.net/p/wLnFY0H_nJz
更新版本 ID
db.bitbucket.aggregate([
{"$match":{"scmBranch":"master"}},
{"$group":{"_id":"$scmUrl","revisionIds":{"$push":"$scmRevisionNumber"}}},
{"$lookup":{
"from":"bitbucket",
"let":{"scmUrl":"$_id","revisionIds":"$revisionIds"},
"pipeline":[
{"$match":{"$expr":
{"$and":[
{"$eq":["$scmBranch","develop"]},
{"$eq":["$scmUrl","$$scmUrl"]},
{"$not":[{"$in":["$scmRevisionNumber","$$revisionIds"]}]}
]}
}},
{"$sort":{"scmCommitTimestamp":-1}}
],
"as":"commits"
}},
{"$match":{"commits.0":{"$exists":true}}},
{"$project":{
"commits":{"$size":"$commits"},
"lastcommit":{"$arrayElemAt":["$commits",0]},
"firstcommit":{"$arrayElemAt":["$commits",-1]}
}}
])