如何使用 GitHub V3 API 获取 repo 的提交计数?
How to use GitHub V3 API to get commit count for a repo?
我正在尝试使用 API 计算许多大型 github repos 的提交,所以我想避免获取整个提交列表(以这种方式为例: api.github.com/repos/jasonrudolph/keyboard/commits )并计算它们。
如果我有第一个(初始)提交的哈希值,我可以 use this technique to compare the first commit to the latest 并且它很乐意以这种方式报告中间的 total_commits(所以我需要添加一个)。不幸的是,我看不到如何使用 API.
优雅地获得第一次提交
基础回购 URL 确实给了我 created_at (这个 url 是一个例子: api.github.com/repos/jasonrudolph/keyboard ),所以我可以得到通过将提交限制在创建日期之前来减少提交集(这个 url 是一个例子:api.github.com/repos/jasonrudolph/keyboard/commits?until=2013-03-30T16:01:43Z)和使用最早的一个(总是排在最后?)或者可能有一个空的父项(不确定分叉项目是否有初始父项提交)。
有没有更好的方法来获取存储库的第一个提交哈希值?
更好的是,对于一个简单的统计数据来说,这整件事似乎很复杂,我想知道我是否遗漏了什么。使用 API 获取回购提交计数有什么更好的想法吗?
编辑:此 somewhat similar question 试图按某些文件进行过滤(“并在其中对特定文件进行过滤。”),因此有不同的答案。
如果您正在寻找默认分支中的提交总数,您可能会考虑不同的方法。
使用 Repo Contributors API 获取所有贡献者的列表:
https://developer.github.com/v3/repos/#list-contributors
列表中的每一项都将包含一个 contributions
字段,它告诉您用户在默认分支中编写了多少次提交。对所有贡献者的这些字段求和,您应该得到默认分支中的提交总数。
贡献者列表通常比提交列表短得多,因此计算默认分支中提交总数的请求应该更少。
我刚刚编写了一个小脚本来执行此操作。
它可能不适用于大型存储库,因为它不处理 GitHub 的速率限制。它还需要 Python requests 包。
#!/bin/env python3.4
import requests
GITHUB_API_BRANCHES = 'https://%(token)s@api.github.com/repos/%(namespace)s/%(repository)s/branches'
GUTHUB_API_COMMITS = 'https://%(token)s@api.github.com/repos/%(namespace)s/%(repository)s/commits?sha=%(sha)s&page=%(page)i'
def github_commit_counter(namespace, repository, access_token=''):
commit_store = list()
branches = requests.get(GITHUB_API_BRANCHES % {
'token': access_token,
'namespace': namespace,
'repository': repository,
}).json()
print('Branch'.ljust(47), 'Commits')
print('-' * 55)
for branch in branches:
page = 1
branch_commits = 0
while True:
commits = requests.get(GUTHUB_API_COMMITS % {
'token': access_token,
'namespace': namespace,
'repository': repository,
'sha': branch['name'],
'page': page
}).json()
page_commits = len(commits)
for commit in commits:
commit_store.append(commit['sha'])
branch_commits += page_commits
if page_commits == 0:
break
page += 1
print(branch['name'].ljust(45), str(branch_commits).rjust(9))
commit_store = set(commit_store)
print('-' * 55)
print('Total'.ljust(42), str(len(commit_store)).rjust(12))
# for private repositories, get your own token from
# https://github.com/settings/tokens
# github_commit_counter('github', 'gitignore', access_token='fnkr:xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx')
github_commit_counter('github', 'gitignore')
简单的解决方法:查看页码。 Github 为您分页。因此您可以通过从 Link header 中获取最后页码,减去一个(您需要手动添加最后一页),乘以页面大小来轻松计算提交次数,抓取结果的最后一页并获取该数组的大小并将两个数字相加。最多调用两次 API!
这是我在 ruby 中使用 octokit gem 获取整个组织的提交总数的实现:
@github = Octokit::Client.new access_token: key, auto_traversal: true, per_page: 100
Octokit.auto_paginate = true
repos = @github.org_repos('my_company', per_page: 100)
# * take the pagination number
# * get the last page
# * see how many items are on it
# * multiply the number of pages - 1 by the page size
# * and add the two together. Boom. Commit count in 2 api calls
def calc_total_commits(repos)
total_sum_commits = 0
repos.each do |e|
repo = Octokit::Repository.from_url(e.url)
number_of_commits_in_first_page = @github.commits(repo).size
repo_sum = 0
if number_of_commits_in_first_page >= 100
links = @github.last_response.rels
unless links.empty?
last_page_url = links[:last].href
/.*page=(?<page_num>\d+)/ =~ last_page_url
repo_sum += (page_num.to_i - 1) * 100 # we add the last page manually
repo_sum += links[:last].get.data.size
end
else
repo_sum += number_of_commits_in_first_page
end
puts "Commits for #{e.name} : #{repo_sum}"
total_sum_commits += repo_sum
end
puts "TOTAL COMMITS #{total_sum_commits}"
end
是的,我知道代码很脏,这只是在几分钟内拼凑起来的。
可以考虑使用GraphQL API v4 to perform commit count for multiple repositories at the same times using aliases。以下将获取 3 个不同存储库的所有分支的提交计数(每个存储库最多 100 个分支):
{
gson: repository(owner: "google", name: "gson") {
...RepoFragment
}
martian: repository(owner: "google", name: "martian") {
...RepoFragment
}
keyboard: repository(owner: "jasonrudolph", name: "keyboard") {
...RepoFragment
}
}
fragment RepoFragment on Repository {
name
refs(first: 100, refPrefix: "refs/heads/") {
edges {
node {
name
target {
... on Commit {
id
history(first: 0) {
totalCount
}
}
}
}
}
}
}
RepoFragment
是一个 fragment,它有助于避免每个 repo
的重复查询字段
如果你只需要默认分支上的提交计数,那就更直接了:
{
gson: repository(owner: "google", name: "gson") {
...RepoFragment
}
martian: repository(owner: "google", name: "martian") {
...RepoFragment
}
keyboard: repository(owner: "jasonrudolph", name: "keyboard") {
...RepoFragment
}
}
fragment RepoFragment on Repository {
name
defaultBranchRef {
name
target {
... on Commit {
id
history(first: 0) {
totalCount
}
}
}
}
}
我使用 python 创建了一个生成器,其中 returns 贡献者列表,汇总总提交计数,然后检查它是否有效。 Returns True
如果提交更少,False
如果提交相同或更多。您唯一需要填写的是使用您的凭据的请求会话。这是我为你写的:
from requests import session
def login()
sess = session()
# login here and return session with valid creds
return sess
def generateList(link):
# you need to login before you do anything
sess = login()
# because of the way that requests works, you must start out by creating an object to
# imitate the response object. This will help you to cleanly while-loop through
# github's pagination
class response_immitator:
links = {'next': {'url':link}}
response = response_immitator()
while 'next' in response.links:
response = sess.get(response.links['next']['url'])
for repo in response.json():
yield repo
def check_commit_count(baseurl, user_name, repo_name, max_commit_count=None):
# login first
sess = login()
if max_commit_count != None:
totalcommits = 0
# construct url to paginate
url = baseurl+"repos/" + user_name + '/' + repo_name + "/stats/contributors"
for stats in generateList(url):
totalcommits+=stats['total']
if totalcommits >= max_commit_count:
return False
else:
return True
def main():
# what user do you want to check for commits
user_name = "arcsector"
# what repo do you want to check for commits
repo_name = "EyeWitness"
# github's base api url
baseurl = "https://api.github.com/"
# call function
check_commit_count(baseurl, user_name, repo_name, 30)
if __name__ == "__main__":
main()
如果你开始一个新项目,使用 GraphQL API v4 可能是解决这个问题的方法,但如果你仍在使用 REST API v3,你可以得到通过将请求限制为每页仅 1 个结果来解决分页问题。通过设置该限制,最后 link 中返回的 pages
的数量将等于总数。
例如使用python3和请求库
def commit_count(project, sha='master', token=None):
"""
Return the number of commits to a project
"""
token = token or os.environ.get('GITHUB_API_TOKEN')
url = f'https://api.github.com/repos/{project}/commits'
headers = {
'Accept': 'application/json',
'Content-Type': 'application/json',
'Authorization': f'token {token}',
}
params = {
'sha': sha,
'per_page': 1,
}
resp = requests.request('GET', url, params=params, headers=headers)
if (resp.status_code // 100) != 2:
raise Exception(f'invalid github response: {resp.content}')
# check the resp count, just in case there are 0 commits
commit_count = len(resp.json())
last_page = resp.links.get('last')
# if there are no more pages, the count must be 0 or 1
if last_page:
# extract the query string from the last page url
qs = urllib.parse.urlparse(last_page['url']).query
# extract the page number from the query string
commit_count = int(dict(urllib.parse.parse_qsl(qs))['page'])
return commit_count
这是一个 JavaScript 使用基于 snowe 方法的 Fetch 的例子
获取示例
/**
* @param {string} owner Owner of repo
* @param {string} repo Name of repo
* @returns {number} Number of total commits the repo contains on main master branch
*/
export const getTotalCommits = (owner, repo) => {
let url = `https://api.github.com/repos/${owner}/${repo}/commits?per_page=100`;
let pages = 0;
return fetch(url, {
headers: {
Accept: "application/vnd.github.v3+json",
},
})
.then((data) => data.headers)
.then(
(result) =>
result
.get("link")
.split(",")[1]
.match(/.*page=(?<page_num>\d+)/).groups.page_num
)
.then((numberOfPages) => {
pages = numberOfPages;
return fetch(url + `&page=${numberOfPages}`, {
headers: {
Accept: "application/vnd.github.v3+json",
},
}).then((data) => data.json());
})
.then((data) => {
return data.length + (pages - 1) * 100;
})
.catch((err) => {
console.log(`ERROR: calling: ${url}`);
console.log("See below for more info:");
console.log(err);
});
};
用法
getTotalCommits('facebook', 'react').then(commits => {
console.log(commits);
});
在 https://api.github.com/repos/{username}/{repo}/commits?sha={branch}&per_page=1&page=1
上提出请求
现在只需获取响应 header 的 Link
参数并抓取位于 rel="last"
之前的页数
此页数等于该分支中的提交总数!
诀窍是使用 &per_page=1&page=1
。它在 1 页中分发了 1 个提交。因此,提交总数将等于页面总数。
我正在尝试使用 API 计算许多大型 github repos 的提交,所以我想避免获取整个提交列表(以这种方式为例: api.github.com/repos/jasonrudolph/keyboard/commits )并计算它们。
如果我有第一个(初始)提交的哈希值,我可以 use this technique to compare the first commit to the latest 并且它很乐意以这种方式报告中间的 total_commits(所以我需要添加一个)。不幸的是,我看不到如何使用 API.
优雅地获得第一次提交基础回购 URL 确实给了我 created_at (这个 url 是一个例子: api.github.com/repos/jasonrudolph/keyboard ),所以我可以得到通过将提交限制在创建日期之前来减少提交集(这个 url 是一个例子:api.github.com/repos/jasonrudolph/keyboard/commits?until=2013-03-30T16:01:43Z)和使用最早的一个(总是排在最后?)或者可能有一个空的父项(不确定分叉项目是否有初始父项提交)。
有没有更好的方法来获取存储库的第一个提交哈希值?
更好的是,对于一个简单的统计数据来说,这整件事似乎很复杂,我想知道我是否遗漏了什么。使用 API 获取回购提交计数有什么更好的想法吗?
编辑:此 somewhat similar question 试图按某些文件进行过滤(“并在其中对特定文件进行过滤。”),因此有不同的答案。
如果您正在寻找默认分支中的提交总数,您可能会考虑不同的方法。
使用 Repo Contributors API 获取所有贡献者的列表:
https://developer.github.com/v3/repos/#list-contributors
列表中的每一项都将包含一个 contributions
字段,它告诉您用户在默认分支中编写了多少次提交。对所有贡献者的这些字段求和,您应该得到默认分支中的提交总数。
贡献者列表通常比提交列表短得多,因此计算默认分支中提交总数的请求应该更少。
我刚刚编写了一个小脚本来执行此操作。 它可能不适用于大型存储库,因为它不处理 GitHub 的速率限制。它还需要 Python requests 包。
#!/bin/env python3.4
import requests
GITHUB_API_BRANCHES = 'https://%(token)s@api.github.com/repos/%(namespace)s/%(repository)s/branches'
GUTHUB_API_COMMITS = 'https://%(token)s@api.github.com/repos/%(namespace)s/%(repository)s/commits?sha=%(sha)s&page=%(page)i'
def github_commit_counter(namespace, repository, access_token=''):
commit_store = list()
branches = requests.get(GITHUB_API_BRANCHES % {
'token': access_token,
'namespace': namespace,
'repository': repository,
}).json()
print('Branch'.ljust(47), 'Commits')
print('-' * 55)
for branch in branches:
page = 1
branch_commits = 0
while True:
commits = requests.get(GUTHUB_API_COMMITS % {
'token': access_token,
'namespace': namespace,
'repository': repository,
'sha': branch['name'],
'page': page
}).json()
page_commits = len(commits)
for commit in commits:
commit_store.append(commit['sha'])
branch_commits += page_commits
if page_commits == 0:
break
page += 1
print(branch['name'].ljust(45), str(branch_commits).rjust(9))
commit_store = set(commit_store)
print('-' * 55)
print('Total'.ljust(42), str(len(commit_store)).rjust(12))
# for private repositories, get your own token from
# https://github.com/settings/tokens
# github_commit_counter('github', 'gitignore', access_token='fnkr:xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx')
github_commit_counter('github', 'gitignore')
简单的解决方法:查看页码。 Github 为您分页。因此您可以通过从 Link header 中获取最后页码,减去一个(您需要手动添加最后一页),乘以页面大小来轻松计算提交次数,抓取结果的最后一页并获取该数组的大小并将两个数字相加。最多调用两次 API!
这是我在 ruby 中使用 octokit gem 获取整个组织的提交总数的实现:
@github = Octokit::Client.new access_token: key, auto_traversal: true, per_page: 100
Octokit.auto_paginate = true
repos = @github.org_repos('my_company', per_page: 100)
# * take the pagination number
# * get the last page
# * see how many items are on it
# * multiply the number of pages - 1 by the page size
# * and add the two together. Boom. Commit count in 2 api calls
def calc_total_commits(repos)
total_sum_commits = 0
repos.each do |e|
repo = Octokit::Repository.from_url(e.url)
number_of_commits_in_first_page = @github.commits(repo).size
repo_sum = 0
if number_of_commits_in_first_page >= 100
links = @github.last_response.rels
unless links.empty?
last_page_url = links[:last].href
/.*page=(?<page_num>\d+)/ =~ last_page_url
repo_sum += (page_num.to_i - 1) * 100 # we add the last page manually
repo_sum += links[:last].get.data.size
end
else
repo_sum += number_of_commits_in_first_page
end
puts "Commits for #{e.name} : #{repo_sum}"
total_sum_commits += repo_sum
end
puts "TOTAL COMMITS #{total_sum_commits}"
end
是的,我知道代码很脏,这只是在几分钟内拼凑起来的。
可以考虑使用GraphQL API v4 to perform commit count for multiple repositories at the same times using aliases。以下将获取 3 个不同存储库的所有分支的提交计数(每个存储库最多 100 个分支):
{
gson: repository(owner: "google", name: "gson") {
...RepoFragment
}
martian: repository(owner: "google", name: "martian") {
...RepoFragment
}
keyboard: repository(owner: "jasonrudolph", name: "keyboard") {
...RepoFragment
}
}
fragment RepoFragment on Repository {
name
refs(first: 100, refPrefix: "refs/heads/") {
edges {
node {
name
target {
... on Commit {
id
history(first: 0) {
totalCount
}
}
}
}
}
}
}
RepoFragment
是一个 fragment,它有助于避免每个 repo
如果你只需要默认分支上的提交计数,那就更直接了:
{
gson: repository(owner: "google", name: "gson") {
...RepoFragment
}
martian: repository(owner: "google", name: "martian") {
...RepoFragment
}
keyboard: repository(owner: "jasonrudolph", name: "keyboard") {
...RepoFragment
}
}
fragment RepoFragment on Repository {
name
defaultBranchRef {
name
target {
... on Commit {
id
history(first: 0) {
totalCount
}
}
}
}
}
我使用 python 创建了一个生成器,其中 returns 贡献者列表,汇总总提交计数,然后检查它是否有效。 Returns True
如果提交更少,False
如果提交相同或更多。您唯一需要填写的是使用您的凭据的请求会话。这是我为你写的:
from requests import session
def login()
sess = session()
# login here and return session with valid creds
return sess
def generateList(link):
# you need to login before you do anything
sess = login()
# because of the way that requests works, you must start out by creating an object to
# imitate the response object. This will help you to cleanly while-loop through
# github's pagination
class response_immitator:
links = {'next': {'url':link}}
response = response_immitator()
while 'next' in response.links:
response = sess.get(response.links['next']['url'])
for repo in response.json():
yield repo
def check_commit_count(baseurl, user_name, repo_name, max_commit_count=None):
# login first
sess = login()
if max_commit_count != None:
totalcommits = 0
# construct url to paginate
url = baseurl+"repos/" + user_name + '/' + repo_name + "/stats/contributors"
for stats in generateList(url):
totalcommits+=stats['total']
if totalcommits >= max_commit_count:
return False
else:
return True
def main():
# what user do you want to check for commits
user_name = "arcsector"
# what repo do you want to check for commits
repo_name = "EyeWitness"
# github's base api url
baseurl = "https://api.github.com/"
# call function
check_commit_count(baseurl, user_name, repo_name, 30)
if __name__ == "__main__":
main()
如果你开始一个新项目,使用 GraphQL API v4 可能是解决这个问题的方法,但如果你仍在使用 REST API v3,你可以得到通过将请求限制为每页仅 1 个结果来解决分页问题。通过设置该限制,最后 link 中返回的 pages
的数量将等于总数。
例如使用python3和请求库
def commit_count(project, sha='master', token=None):
"""
Return the number of commits to a project
"""
token = token or os.environ.get('GITHUB_API_TOKEN')
url = f'https://api.github.com/repos/{project}/commits'
headers = {
'Accept': 'application/json',
'Content-Type': 'application/json',
'Authorization': f'token {token}',
}
params = {
'sha': sha,
'per_page': 1,
}
resp = requests.request('GET', url, params=params, headers=headers)
if (resp.status_code // 100) != 2:
raise Exception(f'invalid github response: {resp.content}')
# check the resp count, just in case there are 0 commits
commit_count = len(resp.json())
last_page = resp.links.get('last')
# if there are no more pages, the count must be 0 or 1
if last_page:
# extract the query string from the last page url
qs = urllib.parse.urlparse(last_page['url']).query
# extract the page number from the query string
commit_count = int(dict(urllib.parse.parse_qsl(qs))['page'])
return commit_count
这是一个 JavaScript 使用基于 snowe 方法的 Fetch 的例子
获取示例
/**
* @param {string} owner Owner of repo
* @param {string} repo Name of repo
* @returns {number} Number of total commits the repo contains on main master branch
*/
export const getTotalCommits = (owner, repo) => {
let url = `https://api.github.com/repos/${owner}/${repo}/commits?per_page=100`;
let pages = 0;
return fetch(url, {
headers: {
Accept: "application/vnd.github.v3+json",
},
})
.then((data) => data.headers)
.then(
(result) =>
result
.get("link")
.split(",")[1]
.match(/.*page=(?<page_num>\d+)/).groups.page_num
)
.then((numberOfPages) => {
pages = numberOfPages;
return fetch(url + `&page=${numberOfPages}`, {
headers: {
Accept: "application/vnd.github.v3+json",
},
}).then((data) => data.json());
})
.then((data) => {
return data.length + (pages - 1) * 100;
})
.catch((err) => {
console.log(`ERROR: calling: ${url}`);
console.log("See below for more info:");
console.log(err);
});
};
用法
getTotalCommits('facebook', 'react').then(commits => {
console.log(commits);
});
在 https://api.github.com/repos/{username}/{repo}/commits?sha={branch}&per_page=1&page=1
现在只需获取响应 header 的 Link
参数并抓取位于 rel="last"
此页数等于该分支中的提交总数!
诀窍是使用 &per_page=1&page=1
。它在 1 页中分发了 1 个提交。因此,提交总数将等于页面总数。