递归下载目录中的特定文件

Question

https://repo1.maven.org/maven2/

这个文件夹包含很多子目录和文件。我只想使用 Python 下载 maven-metadata.xml 个文件。我试过 answer 但它不会递归地遍历子目录。

Answer 1

我也建议使用漂亮的汤..你可以做这样的事情，而我的测试，如果它是一个目录非常非常简单（只是，如果 link ands 带有 '/ '):

from urllib.request import urlopen
import re
from bs4 import BeautifulSoup
import requests


def isDirectory(url):
    if(url.endswith('/')):
        return True
    else:
        return False

def findLinks(url):
    page = requests.get(url).content
    bsObj = BeautifulSoup(page, 'html.parser')
    maybe_directories = bsObj.findAll('a', href=True)

    for link in maybe_directories:
        print(link['href'])
        print(isDirectory(link['href']))
        if(isDirectory(link['href'])):
            newUrl = url + link['href']         
            findLinks(newUrl) #recursion happening here
        else:
            if(link['href'].endswith('maven-metadata.xml')):
                print("GOTCHA!") #now safe and download

startUrl = "https://repo1.maven.org/maven2/"
findLinks(startUrl)

递归下载目录中的特定文件

Download specific files in a directory recursively

python

urllib2