Python 问题无法将数据从网站保存到 .txt 文件
Python Question have trouble saving data from website to .txt file
我在将从网站抓取的原始数据写入 Python 中的 txt.file 时遇到问题。我在这里查看了不同的问题,但仍然无法让我的代码正常工作。有什么建议么?我可以让它打印我想要的东西,但我终究无法弄清楚如何将它简单地写入 .txt 文件。
#PACKAGES WE WILL NEED FOR THIS PROJECT
import csv
import re
import requests
import pprint
from bs4 import BeautifulSoup
from urllib.parse import urljoin, urlparse
#CREATE VARIABLE FOR LINK TO PAGE WE WILL WEBSCRAPE
base_censuspage = "https://www.census.gov/programs-surveys/popest.html"
#EXTRACT DATA FROM WEBPAGE
r = requests.get(base_censuspage)
htmlcontent = r.text
soup = BeautifulSoup(htmlcontent,'html.parser')
links_array = []
#FIND LINKS TO OTHER PAGES AND ADD THEM TO LIST
for link in soup.find_all('a',attrs={'href':re.compile(r'html')}):
links_array.append(urljoin(base_censuspage,link.get('href')))
#REMOVE DUPLICATES AND PRINT LIST TO VERIFY DUPLICATES WERE REMOVED
unique_links = set(links_array)
pprint.pprint(unique_links)
pprint.pprint(htmlcontent)
#SAVE TO CSV FILE
with open("C996PROJECTASSESSMENTCSVFILE.CSV","w") as f:
wr = csv.writer(f,delimiter="\n")
wr.writerow(links_array)
#SAVE TO TXT FILE
with open('webscrapeddata.txt','w') as f:
f.write(htmlcontent)
我运行程序和UnicodeEncodeError
被抛出的原因是htmlcontent
的编码方法与python中打开的文件不同
所以要解决这个问题,只需添加一个编码参数如下:
#SAVE TO TXT FILE
with open('webscrapeddata.txt','w', encoding='utf-8') as f: # <-- HERE
f.write(htmlcontent)
我在将从网站抓取的原始数据写入 Python 中的 txt.file 时遇到问题。我在这里查看了不同的问题,但仍然无法让我的代码正常工作。有什么建议么?我可以让它打印我想要的东西,但我终究无法弄清楚如何将它简单地写入 .txt 文件。
#PACKAGES WE WILL NEED FOR THIS PROJECT
import csv
import re
import requests
import pprint
from bs4 import BeautifulSoup
from urllib.parse import urljoin, urlparse
#CREATE VARIABLE FOR LINK TO PAGE WE WILL WEBSCRAPE
base_censuspage = "https://www.census.gov/programs-surveys/popest.html"
#EXTRACT DATA FROM WEBPAGE
r = requests.get(base_censuspage)
htmlcontent = r.text
soup = BeautifulSoup(htmlcontent,'html.parser')
links_array = []
#FIND LINKS TO OTHER PAGES AND ADD THEM TO LIST
for link in soup.find_all('a',attrs={'href':re.compile(r'html')}):
links_array.append(urljoin(base_censuspage,link.get('href')))
#REMOVE DUPLICATES AND PRINT LIST TO VERIFY DUPLICATES WERE REMOVED
unique_links = set(links_array)
pprint.pprint(unique_links)
pprint.pprint(htmlcontent)
#SAVE TO CSV FILE
with open("C996PROJECTASSESSMENTCSVFILE.CSV","w") as f:
wr = csv.writer(f,delimiter="\n")
wr.writerow(links_array)
#SAVE TO TXT FILE
with open('webscrapeddata.txt','w') as f:
f.write(htmlcontent)
我运行程序和UnicodeEncodeError
被抛出的原因是htmlcontent
的编码方法与python中打开的文件不同
所以要解决这个问题,只需添加一个编码参数如下:
#SAVE TO TXT FILE
with open('webscrapeddata.txt','w', encoding='utf-8') as f: # <-- HERE
f.write(htmlcontent)