遍历 URL 列表，运行 BeautifulSoup，写入文件

Question

我有一个要运行通过的 URL 列表，使用 BeautifulSoup 清理并保存到 .txt 文件。

这是我现在的代码，列表中只有几个项目，还有更多来自 txt 文件的代码，但现在这很简单。

当循环运行时，它会将两个 URL 的输出传递给 URL.txt 文件。我希望列表中的每个实例都输出到其唯一的 .txt 文件。

import urllib
from bs4 import BeautifulSoup


x = ["https://www.sec.gov/Archives/edgar/data/1000298/0001047469-13-002555.txt",
"https://www.sec.gov/Archives/edgar/data/1001082/0001104659-13-011967.txt"]

for url in x:

    #I want to open the URL listed in my list

    fp = urllib.request.urlopen(url)
    test = fp.read()
    soup = BeautifulSoup(test,"lxml")
    output=soup.get_text()

    #and then save the get_text() results to a unique file.

    file=open("url.txt","w",encoding='utf-8')
    file.write(output)
    file.close()

感谢您的浏览。最好的，乔治

Answer 1

为列表中的每个项目创建不同的文件名，如下所示：

import urllib
from bs4 import BeautifulSoup


x = ["https://www.sec.gov/Archives/edgar/data/1000298/0001047469-13-002555.txt",
"https://www.sec.gov/Archives/edgar/data/1001082/0001104659-13-011967.txt"]

for index , url in enumerate(x):

    #I want to open the URL listed in my list

    fp = urllib.request.urlopen(url)
    test = fp.read()
    soup = BeautifulSoup(test,"lxml")
    output=soup.get_text()

    #and then save the get_text() results to a unique file.

    file=open("url%s.txt" % index,"w",encoding='utf-8')
    file.write(output)
    file.close()

遍历 URL 列表，运行 BeautifulSoup，写入文件

Loop through list of URLs, run BeautifulSoup, write to file

python

for-loop

writefile