遍历 URL 列表,运行 BeautifulSoup,写入文件
Loop through list of URLs, run BeautifulSoup, write to file
我有一个要 运行 通过的 URL 列表,使用 BeautifulSoup 清理并保存到 .txt 文件。
这是我现在的代码,列表中只有几个项目,还有更多来自 txt 文件的代码,但现在这很简单。
当循环运行时,它会将两个 URL 的输出传递给 URL.txt 文件。我希望列表中的每个实例都输出到其唯一的 .txt 文件。
import urllib
from bs4 import BeautifulSoup
x = ["https://www.sec.gov/Archives/edgar/data/1000298/0001047469-13-002555.txt",
"https://www.sec.gov/Archives/edgar/data/1001082/0001104659-13-011967.txt"]
for url in x:
#I want to open the URL listed in my list
fp = urllib.request.urlopen(url)
test = fp.read()
soup = BeautifulSoup(test,"lxml")
output=soup.get_text()
#and then save the get_text() results to a unique file.
file=open("url.txt","w",encoding='utf-8')
file.write(output)
file.close()
感谢您的浏览。最好的,乔治
为列表中的每个项目创建不同的文件名,如下所示:
import urllib
from bs4 import BeautifulSoup
x = ["https://www.sec.gov/Archives/edgar/data/1000298/0001047469-13-002555.txt",
"https://www.sec.gov/Archives/edgar/data/1001082/0001104659-13-011967.txt"]
for index , url in enumerate(x):
#I want to open the URL listed in my list
fp = urllib.request.urlopen(url)
test = fp.read()
soup = BeautifulSoup(test,"lxml")
output=soup.get_text()
#and then save the get_text() results to a unique file.
file=open("url%s.txt" % index,"w",encoding='utf-8')
file.write(output)
file.close()
我有一个要 运行 通过的 URL 列表,使用 BeautifulSoup 清理并保存到 .txt 文件。
这是我现在的代码,列表中只有几个项目,还有更多来自 txt 文件的代码,但现在这很简单。
当循环运行时,它会将两个 URL 的输出传递给 URL.txt 文件。我希望列表中的每个实例都输出到其唯一的 .txt 文件。
import urllib
from bs4 import BeautifulSoup
x = ["https://www.sec.gov/Archives/edgar/data/1000298/0001047469-13-002555.txt",
"https://www.sec.gov/Archives/edgar/data/1001082/0001104659-13-011967.txt"]
for url in x:
#I want to open the URL listed in my list
fp = urllib.request.urlopen(url)
test = fp.read()
soup = BeautifulSoup(test,"lxml")
output=soup.get_text()
#and then save the get_text() results to a unique file.
file=open("url.txt","w",encoding='utf-8')
file.write(output)
file.close()
感谢您的浏览。最好的,乔治
为列表中的每个项目创建不同的文件名,如下所示:
import urllib
from bs4 import BeautifulSoup
x = ["https://www.sec.gov/Archives/edgar/data/1000298/0001047469-13-002555.txt",
"https://www.sec.gov/Archives/edgar/data/1001082/0001104659-13-011967.txt"]
for index , url in enumerate(x):
#I want to open the URL listed in my list
fp = urllib.request.urlopen(url)
test = fp.read()
soup = BeautifulSoup(test,"lxml")
output=soup.get_text()
#and then save the get_text() results to a unique file.
file=open("url%s.txt" % index,"w",encoding='utf-8')
file.write(output)
file.close()