如何确定 GitHub 上的哪些分叉在前?

How to determine which forks on GitHub are ahead?

有时,我正在使用的某个软件(例如 linkchecker)的原始 GitHub 存储库很少或根本没有开发,而已经创建了很多分支(在这种情况:142,在撰写本文时)。

对于每个分叉,我想知道:

并且对于每个这样的分支:

, but I don't want to do this manually for each fork, I just want a CSV file with the results for all forks. How can this be scripted? The GitHub API can list the forks,但我看不出如何将叉子与它进行比较。依次克隆每个fork并在本地进行比较似乎有点粗糙。

有完全相同的渴望并写了一个刮板,它获取打印在呈现的 HTML 中的信息用于叉子:https://github.com/hbbio/forkizard

肯定不完美,只是临时解决方案。

active-forks 不完全符合我的要求,但它接近并且非常易于使用。

派对迟到了 - 我认为这是我第二次完成此 SO post 所以我将分享我的基于 js 的解决方案(我最终通过获取并搜索 html 页)。 您可以从中创建一个 bookmarklet,或者简单地将整个内容粘贴到控制台中。适用于 chromium 和 firefox:

编辑:如果页面上有超过 10 个左右的分叉,您可能会因为抓取太快而被锁定(网络中的 429 请求太多)。使用 async / await 代替:

javascript:(async () => {
  /* while on the forks page, collect all the hrefs and pop off the first one (original repo) */
  const forks = [...document.querySelectorAll('div.repo a:last-of-type')].map(x => x.href).slice(1);

  for (const fork of forks) {
    /* fetch the forked repo as html, search for the "This branch is [n commits ahead,] [m commits behind]", print it to console */
    await fetch(fork)
      .then(x => x.text())
      .then(html => console.log(`${fork}: ${html.match(/This branch is.*/).pop().replace('This branch is ', '')}`))
      .catch(console.error);
  }
})();

或者你可以分批进行,但是很容易被锁在门外

javascript:(async () => {
  /* while on the forks page, collect all the hrefs and pop off the first one (original repo) */
  const forks = [...document.querySelectorAll('div.repo a:last-of-type')].map(x => x.href).slice(1);

  getfork = (fork) => {
    return fetch(fork)
      .then(x => x.text())
      .then(html => console.log(`${fork}: ${html.match(/This branch is.*/).pop().replace('This branch is ', '')}`))
      .catch(console.error);
  }

  while (forks.length) {
    await Promise.all(forks.splice(0, 2).map(getfork));
  }
})();

原始(这会立即触发所有请求,如果 requests/s 超过 github 允许的数量,可能会将您拒之门外)

javascript:(() => {
  /* while on the forks page, collect all the hrefs and pop off the first one (original repo) */
  const forks = [...document.querySelectorAll('div.repo a:last-of-type')].map(x => x.href).slice(1);

  for (const fork of forks) {
    /* fetch the forked repo as html, search for the "This branch is [n commits ahead,] [m commits behind]", print it to console */
    fetch(fork)
      .then(x => x.text())
      .then(html => console.log(`${fork}: ${html.match(/This branch is.*/).pop().replace('This branch is ', '')}`))
      .catch(console.error);
  }
})();

将打印如下内容:

https://github.com/user1/repo: 289 commits behind original:master.
https://github.com/user2/repo: 489 commits behind original:master.
https://github.com/user2/repo: 1 commit ahead, 501 commits behind original:master.
...

去安慰。

编辑:将注释替换为可粘贴的块注释

点击顶部的“Insights”,然后点击左侧的“Forks”后,以下小书签将信息直接打印到网页上,如下所示:

添加为小书签(或粘贴到控制台)的代码:

javascript:(async () => {
  /* while on the forks page, collect all the hrefs and pop off the first one (original repo) */
  const aTags = [...document.querySelectorAll('div.repo a:last-of-type')].slice(1);

  for (const aTag of aTags) {
    /* fetch the forked repo as html, search for the "This branch is [n commits ahead,] [m commits behind]", print it directly onto the web page */
    await fetch(aTag.href)
      .then(x => x.text())
      .then(html => aTag.outerHTML += `${html.match(/This branch is.*/).pop().replace('This branch is', '').replace(/([0-9]+ commits? ahead)/, '<font color="#0c0"></font>').replace(/([0-9]+ commits? behind)/, '<font color="red"></font>')}`)
      .catch(console.error);
  }
})();

您也可以将代码粘贴到地址栏中,但请注意,某些浏览器在粘贴时会删除前导 javascript:,因此您必须自己键入 javascript:。或者复制除前导 j 之外的所有内容,键入 j,然后粘贴其余部分。

修改自


奖金

以下小书签还打印 ZIP 文件的链接:

添加为小书签(或粘贴到控制台)的代码:

javascript:(async () => {
  /* while on the forks page, collect all the hrefs and pop off the first one (original repo) */
  const aTags = [...document.querySelectorAll('div.repo a:last-of-type')].slice(1);

  for (const aTag of aTags) {
    /* fetch the forked repo as html, search for the "This branch is [n commits ahead,] [m commits behind]", print it directly onto the web page */
    await fetch(aTag.href)
      .then(x => x.text())
      .then(html => aTag.outerHTML += `${html.match(/This branch is.*/).pop().replace('This branch is', '').replace(/([0-9]+ commits? ahead)/, '<font color="#0c0"></font>').replace(/([0-9]+ commits? behind)/, '<font color="red"></font>')}` + " <a " + `${html.match(/href="[^"]*\.zip">/).pop() + "Download ZIP</a>"}`)
      .catch(console.error);
  }
})();

这是一个 Python 脚本,用于 列出和克隆 所有前面的分支。

它不使用 API。因此它不受速率限制,也不需要身份验证。但如果 GitHub 网站设计发生变化,可能需要进行调整。

与其他答案中显示 ZIP 文件链接的小书签不同,此脚本还保存有关提交的信息,因为它使用 git clone 并创建一个包含概述的 commits.htm 文件。

import requests, re, os, sys, time

def content_from_url(url):
    # TODO handle internet being off and stuff
    text = requests.get(url).content
    return text

def clone_ahead_forks(forklist_url):
    forklist_htm = content_from_url(forklist_url)
    with open("forklist.htm", "w") as text_file:
        text_file.write(forklist_htm)
        
    is_root = True
    # not working if there are no forks: '<a class="(Link--secondary)?" href="(/([^/"]*)/[^/"]*)">'
    for match in re.finditer('<a (class=""|data-pjax="#js-repo-pjax-container") href="(/([^/"]*)/[^/"]*)">', forklist_htm):
        fork_url = 'https://github.com'+match.group(2)
        fork_owner_login = match.group(3)
        fork_htm = content_from_url(fork_url)
        
        match2 = re.search('<div class="d-flex flex-auto">[^<]*?([0-9]+ commits? ahead(, [0-9]+ commits? behind)?)', fork_htm)
        # TODO if website design changes, fallback onto checking whether 'ahead'/'behind'/'even with' appear only once on the entire page - in that case they are not part of the username etc.
        
        sys.stdout.write('.')
        if match2 or is_root:
            if match2:
                aheadness = match2.group(1) # for example '1 commit ahead, 2 commits behind'
            else:
                aheadness = 'root repo'
                is_root = False # for subsequent iterations
                
            dir = fork_owner_login+' ('+aheadness+')'
            print dir
            
            os.mkdir(dir)
            os.chdir(dir)
            
            # save commits.htm
            commits_htm = content_from_url(fork_url+'/commits')            
            with open("commits.htm", "w") as text_file:
                text_file.write(commits_htm)
            
            # git clone
            os.system('git clone '+fork_url+'.git')
            print
            
            # no need to recurse into forks of forks because they are all listed on the initial page and being traversed already
                
            os.chdir('..')

    


base_path = os.getcwd()
match_disk_letter = re.search(r'^([a-zA-Z]:\)', base_path)


with open('repo_urls.txt') as url_file:
    for url in url_file:
        url = url.strip()
        match = re.search('github.com/([^/]*)/([^/]*)$', url)
        if match:
            user_name = match.group(1)
            repo_name = match.group(2)
            print repo_name
            dirname_for_forks = repo_name+' ('+user_name+')'
            if not os.path.exists(dirname_for_forks):
                url += "/network/members" # page that lists the forks

                TMP_DIR = 'tmp_'+time.strftime("%Y%m%d-%H%M%S")
                if match_disk_letter: # if Windows, i.e. if path starts with A:\ or so, run git in A:\tmp_... instead of .\tmp_..., in order to prevent "filename too long" errors
                    TMP_DIR = match_disk_letter.group(1)+TMP_DIR
                print TMP_DIR

                os.mkdir(TMP_DIR)
                os.chdir(TMP_DIR)
                clone_ahead_forks(url)
                print
                os.chdir(base_path)
                os.rename(TMP_DIR, dirname_for_forks)
            else:
                print dirname_for_forks+' already exists, skipping.'

如果你用下面的内容制作文件repo_urls.txt(你可以放几个URL,每行一个URL):

https://github.com/cifkao/tonnetz-viz

然后您将获得以下目录,每个目录都包含相应的克隆存储库:

tonnetz-viz (cifkao)
  bakaiadam (2 commits ahead)
  chumo (2 commits ahead, 4 commits behind)
  cifkao (root repo)
  codedot (76 commits ahead, 27 commits behind)
  k-hatano (41 commits ahead)
  shimafuri (11 commits ahead, 8 commits behind)

如果不行,试试earlier versions

这里有一个 Python 脚本,用于列出和克隆前面的分叉。此脚本部分使用了 API,因此它触发了速率限制(您可以通过向脚本添加 GitHub API authentication 来扩展速率限制(不是无限地),请编辑或 post )。

最初我尝试完全使用 API,但是触发速率限制的速度太快,所以现在我使用 is_fork_ahead_HTML 而不是 is_fork_ahead_API。如果 GitHub 网站设计发生变化,这可能需要进行调整。

由于速率限制,我更喜欢我在此处post编辑的其他答案。

import requests, json, os, re

def obj_from_json_from_url(url):
    # TODO handle internet being off and stuff
    text = requests.get(url).content
    obj = json.loads(text)
    return obj, text

def is_fork_ahead_API(fork, default_branch_of_parent):
    """ Use the GitHub API to check whether `fork` is ahead.
     This triggers the rate limit, so prefer the non-API version below instead.
    """
    # Compare default branch of original repo with default branch of fork.
    comparison, comparison_json = obj_from_json_from_url('https://api.github.com/repos/'+user+'/'+repo+'/compare/'+default_branch_of_parent+'...'+fork['owner']['login']+':'+fork['default_branch'])
    if comparison['ahead_by']>0:
        return comparison_json
    else:
        return False

def is_fork_ahead_HTML(fork):
    """ Use the GitHub website to check whether `fork` is ahead.
    """
    htm = requests.get(fork['html_url']).content
    match = re.search('<div class="d-flex flex-auto">[^<]*?([0-9]+ commits? ahead(, [0-9]+ commits? behind)?)', htm)
    # TODO if website design changes, fallback onto checking whether 'ahead'/'behind'/'even with' appear only once on the entire page - in that case they are not part of the username etc.
    if match:
        return match.group(1) # for example '1 commit ahead, 114 commits behind'
    else:
        return False

def clone_ahead_forks(user,repo):
    obj, _ = obj_from_json_from_url('https://api.github.com/repos/'+user+'/'+repo)
    default_branch_of_parent = obj["default_branch"]
    
    page = 0
    forks = None
    while forks != [{}]:
        page += 1
        forks, _ = obj_from_json_from_url('https://api.github.com/repos/'+user+'/'+repo+'/forks?per_page=100&page='+str(page))

        for fork in forks:
            aheadness = is_fork_ahead_HTML(fork)
            if aheadness:
                #dir = fork['owner']['login']+' ('+str(comparison['ahead_by'])+' commits ahead, '+str(comparison['behind_by'])+'commits behind)'
                dir = fork['owner']['login']+' ('+aheadness+')'
                print dir
                os.mkdir(dir)
                os.chdir(dir)
                os.system('git clone '+fork['clone_url'])
                print
                
                # recurse into forks of forks
                if fork['forks_count']>0:
                    clone_ahead_forks(fork['owner']['login'], fork['name'])
                    
                os.chdir('..')

user = 'cifkao'
repo = 'tonnetz-viz'

clone_ahead_forks(user,repo)

这是一个使用 Github API 的 Python 脚本。我想包括日期和最后一次提交消息。如果您需要增加到 5k requests/hr.

,则需要包含个人访问令牌 (PAT)

用法:python3 list-forks.py https://github.com/itinance/react-native-fs

示例输出:

https://github.com/itinance/react-native-fs root 2021-11-04 "Merge pull request #1016 from mjgallag/make-react-native-windows-peer-dependency-optional  make react-native-windows peer dependency optional"
https://github.com/AnimoApps/react-native-fs diverged +2 -160 [+1m 10d] "Improved comments to align with new PNG support in copyAssetsFileIOS"
https://github.com/twinedo/react-native-fs ahead +1 [+26d] "clear warn yellow new NativeEventEmitter()"
https://github.com/synonymdev/react-native-fs ahead +2 [+23d] "Merge pull request #1 from synonymdev/event-emitter-fix  Event Emitter Fix"
https://github.com/kongyes/react-native-fs ahead +2 [+10d] "aa"
https://github.com/kamiky/react-native-fs diverged +1 -2 [-6d] "add copyCurrentAssetsVideoIOS function to retrieve current modified videos"
https://github.com/nikola166/react-native-fs diverged +1 -2 [-7d] "version"
https://github.com/morph3ux/react-native-fs diverged +1 -4 [-30d] "Update package.json"
https://github.com/broganm/react-native-fs diverged +2 -4 [-1m 7d] "Update RNFSManager.m"
https://github.com/k1mmm/react-native-fs diverged +1 -4 [-1m 14d] "Invalidate upload session  Prevent memory leaks"
https://github.com/TickKleiner/react-native-fs diverged +1 -4 [-1m 24d] "addListener and removeListeners methods wass added to pass warning"
https://github.com/nerdyfactory/react-native-fs diverged +1 -8 [-2m 14d] "fix: applying change from https://github.com/itinance/react-native-fs/pull/944"
import requests, re, os, sys, time, json, datetime
from dateutil.relativedelta import relativedelta
from urllib.parse import urlparse

GITHUB_PAT = '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'

def json_from_url(url):
    response = requests.get(url, headers={ 'Authorization': 'token {}'.format(GITHUB_PAT) })
    return response.json()

def date_delta_to_text(date1, date2):
    ret = []
    date_delta = relativedelta(date2, date1)
    sign = '+' if date1 < date2 else '-'

    if date_delta.years != 0:
        ret.append('{}y'.format(abs(date_delta.years)))

    if date_delta.months != 0:
        ret.append('{}m'.format(abs(date_delta.months)))
        
    if date_delta.days != 0:
        ret.append('{}d'.format(abs(date_delta.days)))
    
    return '{}{}'.format(sign, ' '.join(ret))

def iso8601_date_to_date(date):
    return datetime.datetime.strptime(date, '%Y-%m-%dT%H:%M:%SZ')

def date_to_text(date):
    return date.strftime('%Y-%m-%d')

def process_repo(repo_author, repo_name, fork_of_fork):
    page = 1

    while 1:
        forks_url = 'https://api.github.com/repos/{}/{}/forks?per_page=100&page={}'.format(repo_author, repo_name, page)
        forks_json = json_from_url(forks_url)

        if not forks_json:
            break

        for fork_info in forks_json:
            fork_author = fork_info['owner']['login']
            fork_name = fork_info['name']
            forks_count = fork_info['forks_count']
            fork_url = 'https://github.com/{}/{}'.format(fork_author, fork_name)

            compare_url = 'https://api.github.com/repos/{}/{}/compare/master...{}:master'.format(repo_author, fork_name, fork_author)
            compare_json = json_from_url(compare_url)

            if 'status' in compare_json:
                items = []

                status = compare_json['status']
                ahead_by = compare_json['ahead_by']
                behind_by = compare_json['behind_by']
                total_commits = compare_json['total_commits']
                commits = compare_json['commits']

                if fork_of_fork:
                    items.append('   ')

                items.append(fork_url)
                items.append(status)

                if ahead_by != 0:
                    items.append('+{}'.format(ahead_by))

                if behind_by != 0:
                    items.append('-{}'.format(behind_by))

                if total_commits > 0:
                    last_commit = commits[total_commits-1];
                    commit = last_commit['commit']
                    author = commit['author']
                    date = iso8601_date_to_date(author['date'])
                    items.append('[{}]'.format(date_delta_to_text(root_date, date)))
                    items.append('"{}"'.format(commit['message'].replace('\n', ' ')))

                if ahead_by > 0:
                    print(' '.join(items))

            if forks_count > 0:
                process_repo(fork_author, fork_name, True)

        page += 1

url_parsed = urlparse(sys.argv[1].strip())
path_array = url_parsed.path.split('/')
root_author = path_array[1]
root_name = path_array[2]

root_url = 'https://github.com/{}/{}'.format(root_author, root_name)
commits_url = 'https://api.github.com/repos/{}/{}/commits/master'.format(root_author, root_name)
commits_json = json_from_url(commits_url)
commit = commits_json['commit']
author = commit['author']
root_date = iso8601_date_to_date(author['date'])
print('{} root {} "{}"'.format(root_url, date_to_text(root_date), commit['message'].replace('\n', ' ')));

process_repo(root_author, root_name, False)

useful-forks

useful-forks 是一个在线工具,它根据 ahead 标准过滤所有分叉。我认为它很好地满足了您的需求。 :)

对于你问题中的回购,你可以这样做:https://useful-forks.github.io/?repo=wummel/linkchecker

这应该会为您提供与(2022-04-02 上的运行)类似的结果:

也可以作为 Chrome 插件使用

如果您也想将其用作 Chrome 插件,您可以查看 GitHub 存储库:https://github.com/useful-forks/useful-forks.github.io#chrome-extension-wip

免责声明

我是这个项目的维护者