Python 在 for 循环中下载多个文件

Question

我有一个 URL 列表，这些 URL 指向 SEC 的文件（例如，https://www.sec.gov/Archives/edgar/data/18651/000119312509042636/d10k.htm）

我的目标是编写一个 for 循环来打开 URL、请求文档并将其保存到文件夹中。但是，我以后需要能够识别这些文件。这就是为什么我想使用 "htps://www.sec.gov/Archives/edgar/data/18651/000119312509042636/d10k.htm" 这个备案专用编号作为文档名称

directory = r"\Desktopks"
for url in url_list:
    response = requests.get(url).content
    path = (directory + str(url)[40:-5] +".txt")
    with open(path, "w") as f:
        f.write(response)
    f.close()

但每次，我都会收到以下错误消息：filenotfounderror：[errno 2] 没有那个文件或目录：

我真的希望你能帮帮我！！谢谢

Answer 1

这个有效

for url in url_list:
    response = requests.get(url).content.decode('utf-8')
    path = (directory + str(url)[40:-5] +".txt").replace('/', '\')
    with open(path, "w+") as f:
        f.write(response)
    f.close()

你构建的路径是这样的 \Desktop\10ks18651/000119312509042636/d10.txt 我想你正在为那些反斜杠处理 windows，无论如何你只需要替换 [=] 中的斜杠22=] 到反斜杠。

另一件事，write 收到一个字符串，因此您需要将以字节为单位的响应解码为字符串。

希望对您有所帮助！

Answer 2

import requests
import os
url_list = ["https://www.sec.gov/Archives/edgar/data/18651/000119312509042636/d10k.htm"]
#Create the path Desktop/10ks/
directory = os.path.expanduser("~/Desktop") + "\10ks"
for url in url_list:
    #Get the content as string instead of getting it as bytes
    response = requests.get(url).text
    #Replace slash in filename with underscore
    filename = str(url)[40:-5].replace("/", "_")
    #print filename to check if it is correct
    print(filename)
    path = (directory + "\" + filename +".txt")
    with open(path, "w") as f:
        f.write(response)
    f.close()

查看评论。我想文件名中的反斜杠是不允许的，因为

filename = str(url)[40:-5].replace("/", "\")

给我

FileNotFoundError: [Errno 2] No such file or directory: 'C:\Users\user/Desktop\10ks\18651\000119312509042636\d10.txt'

另请参阅：
https://docs.python.org/3/library/os.path.html#os.path.expanduser

https://docs.python.org/3/library/stdtypes.html#str.replace

Python 在 for 循环中下载多个文件

Python download multiple files within for loop

python

download

python-requests