如何写一个文件作为我的打印?
How to write a file as my print?
我在Python3用bs4写了一个程序成功获取了维基百科的子目录。现在,我可以打印结果,但无法将结果写入文件。
from bs4 import BeautifulSoup
import requests
import csv
url = 'https://en.wikipedia.org/wiki/Category:proprietary software'
content = requests.get(url).content
soup = BeautifulSoup(content,'lxml')
noOFsubcategories = soup.find('p')
print('------------------------------------------------------------------')
print(noOFsubcategories.text+'------------------------------------------------------------------')
tag = soup.find('div', {'class' : 'mw-category'})
links = tag.findAll('a')
#print(links)
counter = 1
for link in links:
print ( str(counter) + " " + link.text)
counter = counter + 1
with open('subcategories.csv', 'a') as f:
f.write(links)
首先,用索引和 link 文本初始化一个列表列表,然后使用下面的 csv.writer
to write to a csv file. Note the use of enumerate()
:
links = [[index, a.get_text()] for index, a in enumerate(tag.find_all('a'), start=1)]
with open('subcategories.csv', 'a') as f:
writer = csv.writer(f)
writer.writerows(links)
而且,您可以通过使用单个 CSS selector:
改进您定位子类别的方式
soup.select("div.mw-category a")
我正在执行的完整代码:
import csv
from bs4 import BeautifulSoup
import requests
url = 'https://en.wikipedia.org/wiki/Category:proprietary software'
content = requests.get(url).content
soup = BeautifulSoup(content, 'lxml')
noOFsubcategories = soup.find('p')
tag = soup.find('div', {'class': 'mw-category'})
links = [[index, a.get_text()] for index, a in enumerate(tag.find_all('a'), start=1)]
with open('subcategories.csv', 'a') as f:
writer = csv.writer(f)
writer.writerows(links)
在 运行 此代码之后 subcategories.csv
的内容将是:
1,Formerly free software
2,Formerly proprietary software
3,Freeware
4,Oracle software
5,Proprietary cross-platform software
6,Proprietary database management systems
7,Proprietary operating systems
8,Proprietary version control systems
9,Proprietary wiki software
10,Shareware
11,VMware
12,Warez
稍微改动一下,把write放在循环下面,每次循环都会写一个link到file
counter = 1
for link in links:
print ( str(counter) + " " + link.text)
counter = counter + 1
with open('subcategories.csv', 'a') as f:
f.write(link['href'].split(':')[1]+'\n')
输出:
/wiki/Category:Formerly_proprietary_software
/wiki/Category:Freeware
/wiki/Category:Oracle_software
/wiki/Category:Proprietary_cross-platform_software
/wiki/Category:Proprietary_database_management_systems
/wiki/Category:Proprietary_operating_systems
/wiki/Category:Proprietary_version_control_systems
/wiki/Category:Proprietary_wiki_software
/wiki/Category:Shareware
/wiki/Category:VMware
/wiki/Category:Warez
更好:
# do not need to open file in each loop, just put it above loop
counter = 1
with open('subcategories.csv', 'a') as f:
for link in links:
print ( str(counter) + " " + link.text)
counter = counter + 1
f.write(link['href']+'\n')
我在Python3用bs4写了一个程序成功获取了维基百科的子目录。现在,我可以打印结果,但无法将结果写入文件。
from bs4 import BeautifulSoup
import requests
import csv
url = 'https://en.wikipedia.org/wiki/Category:proprietary software'
content = requests.get(url).content
soup = BeautifulSoup(content,'lxml')
noOFsubcategories = soup.find('p')
print('------------------------------------------------------------------')
print(noOFsubcategories.text+'------------------------------------------------------------------')
tag = soup.find('div', {'class' : 'mw-category'})
links = tag.findAll('a')
#print(links)
counter = 1
for link in links:
print ( str(counter) + " " + link.text)
counter = counter + 1
with open('subcategories.csv', 'a') as f:
f.write(links)
首先,用索引和 link 文本初始化一个列表列表,然后使用下面的 csv.writer
to write to a csv file. Note the use of enumerate()
:
links = [[index, a.get_text()] for index, a in enumerate(tag.find_all('a'), start=1)]
with open('subcategories.csv', 'a') as f:
writer = csv.writer(f)
writer.writerows(links)
而且,您可以通过使用单个 CSS selector:
改进您定位子类别的方式soup.select("div.mw-category a")
我正在执行的完整代码:
import csv
from bs4 import BeautifulSoup
import requests
url = 'https://en.wikipedia.org/wiki/Category:proprietary software'
content = requests.get(url).content
soup = BeautifulSoup(content, 'lxml')
noOFsubcategories = soup.find('p')
tag = soup.find('div', {'class': 'mw-category'})
links = [[index, a.get_text()] for index, a in enumerate(tag.find_all('a'), start=1)]
with open('subcategories.csv', 'a') as f:
writer = csv.writer(f)
writer.writerows(links)
在 运行 此代码之后 subcategories.csv
的内容将是:
1,Formerly free software
2,Formerly proprietary software
3,Freeware
4,Oracle software
5,Proprietary cross-platform software
6,Proprietary database management systems
7,Proprietary operating systems
8,Proprietary version control systems
9,Proprietary wiki software
10,Shareware
11,VMware
12,Warez
稍微改动一下,把write放在循环下面,每次循环都会写一个link到file
counter = 1
for link in links:
print ( str(counter) + " " + link.text)
counter = counter + 1
with open('subcategories.csv', 'a') as f:
f.write(link['href'].split(':')[1]+'\n')
输出:
/wiki/Category:Formerly_proprietary_software
/wiki/Category:Freeware
/wiki/Category:Oracle_software
/wiki/Category:Proprietary_cross-platform_software
/wiki/Category:Proprietary_database_management_systems
/wiki/Category:Proprietary_operating_systems
/wiki/Category:Proprietary_version_control_systems
/wiki/Category:Proprietary_wiki_software
/wiki/Category:Shareware
/wiki/Category:VMware
/wiki/Category:Warez
更好:
# do not need to open file in each loop, just put it above loop
counter = 1
with open('subcategories.csv', 'a') as f:
for link in links:
print ( str(counter) + " " + link.text)
counter = counter + 1
f.write(link['href']+'\n')