在循环中创建 URL

Creating URLs in a loop

我正在尝试使用 for 循环创建 URL 列表。它会打印所有正确的 URL,但不会将它们保存在列表中。最终我想使用 urlretrieve.

下载多个文件
for i, j in zip(range(0, 17), range(1, 18)):
    if i < 8 or j < 10:
        url = "https://Here is a URL/P200{}".format(i) + "-0{}".format(j) + ".xls"
        print(url)
    if i == 9 and j == 10:
        url = "https://Here is a URL/P200{}".format(i) + "-{}".format(j) + ".xls"
        print(url)
    if i > 9:
        if i > 9 or j < 8:
            url = "https://Here is a URL/P20{}".format(i) + "-{}".format(j) + ".xls"
            print(url)

以上代码的输出是:

https://Here is a URL/P2000-01.xls
https://Here is a URL/P2001-02.xls
https://Here is a URL/P2002-03.xls
https://Here is a URL/P2003-04.xls
https://Here is a URL/P2004-05.xls
https://Here is a URL/P2005-06.xls
https://Here is a URL/P2006-07.xls
https://Here is a URL/P2007-08.xls
https://Here is a URL/P2008-09.xls
https://Here is a URL/P2009-10.xls
https://Here is a URL/P2010-11.xls
https://Here is a URL/P2011-12.xls
https://Here is a URL/P2012-13.xls
https://Here is a URL/P2013-14.xls
https://Here is a URL/P2014-15.xls
https://Here is a URL/P2015-16.xls
https://Here is a URL/P2016-17.xls

但是这个:

url

仅给出:

'https://Here is a URL/P2016-17.xls'

如何获取所有 URL,而不仅仅是最后一个?

您正在用最终 URL 覆盖 URL 的结果。您需要维护最终列表并不断向列表中添加新值

import urllib.parse
url=[];
for i,j in zip(range(0,17),range(1,18)):
    if(i<8 or j<10):
        url.append("https://Here is a URL/P200{}".format(i)+"-0{}".format(j)+".xls")
    if(i==9 and  j==10):
        url.append("https://Here is a URL/P200{}".format(i)+"-{}".format(j)+".xls") 
    if(i>9):
        if((i>9) or (j<8)):
            url.append("https://Here is a URL/P20{}".format(i)+"-{}".format(j)+".xls")

for urlValue in url:
            print(urllib.parse.quote(urlValue))

有几件事可以显着简化您的代码。首先,这个:

"https://Here is a URL/P200{}".format(i) + "-0{}".format(j) + ".xls"

可以简化为:

"https://Here is a URL/P200{}-0{}.xls".format(i, j)

如果您至少有 Python 3.6,您可以使用 f-string 代替:

f"https://Here is a URL/P200{i}-0{j}.xls"

其次,Python有几种方法可以方便地pad numbers with zeroes. It can even be done as part of the f-string formatting. Additionally, range默认从零开始。

所以你的整个原始代码相当于:

for num in range(17):
    print(f'https://Here is a URL/P20{num:02}-{num+1:02}.xls')

现在,您想实际使用这些 URL,而不仅仅是打印出来。你提到建立一个列表,可以这样做:

urls = []
for num in range(17):
    urls.append(f'https://Here is a URL/P20{num:02}-{num+1:02}.xls')

list comprehension:

urls = [f'https://Here is a URL/P20{num:02}-{num+1:02}.xls'
        for num in range(17)]

根据您的意见, and on your , you seem to be confused about what form you need these URLs to be in. Strings like this are already what you need. urlretrieve 接受 URL 作为字符串 ,因此您无需进行任何进一步处理。请参阅文档中的示例:

local_filename, headers = urllib.request.urlretrieve('http://python.org/')
html = open(local_filename)
html.close()

但是,我建议不要使用 urlretrieve,原因有两个。

  1. 如文档所述,urlretrieve 是一种可能会被弃用的遗留方法。如果您要使用 urllib,请改用 urlopen 方法。

  2. 但是,正如 Paul Becotte 在 to your other question: if you're looking to fetch URLs, I would recommend installing and using Requests 而不是 urllib 中提到的那样。更加人性化。

无论您选择哪种方法,字符串都很好。下面是使用请求将每个指定电子表格下载到当前目录的代码:

import requests

base_url = 'https://Here is a URL/'

for num in range(17):
    filename = f'P20{num:02}-{num+1:02}.xls'
    xls = requests.get(base_url + filename)
    with open(filename, 'wb') as f:
        f.write(xls.content)