如何写一个文件作为我的打印?

How to write a file as my print?

我在Python3用bs4写了一个程序成功获取了维基百科的子目录。现在,我可以打印结果,但无法将结果写入文件。

from bs4 import BeautifulSoup
import requests
import csv

url = 'https://en.wikipedia.org/wiki/Category:proprietary software'
content = requests.get(url).content
soup = BeautifulSoup(content,'lxml')
noOFsubcategories = soup.find('p')
print('------------------------------------------------------------------') 
print(noOFsubcategories.text+'------------------------------------------------------------------')
tag = soup.find('div', {'class' : 'mw-category'})
links = tag.findAll('a')
#print(links)

counter = 1
for link in links:
    print ( str(counter) + "  " + link.text)
    counter = counter + 1

with open('subcategories.csv', 'a') as f:
    f.write(links)

首先,用索引和 link 文本初始化一个列表列表,然后使用下面的 csv.writer to write to a csv file. Note the use of enumerate():

links = [[index, a.get_text()] for index, a in enumerate(tag.find_all('a'), start=1)]

with open('subcategories.csv', 'a') as f:
    writer = csv.writer(f)
    writer.writerows(links)

而且,您可以通过使用单个 CSS selector:

改进您定位子类别的方式
soup.select("div.mw-category a")

我正在执行的完整代码:

import csv

from bs4 import BeautifulSoup
import requests


url = 'https://en.wikipedia.org/wiki/Category:proprietary software'
content = requests.get(url).content
soup = BeautifulSoup(content, 'lxml')
noOFsubcategories = soup.find('p')

tag = soup.find('div', {'class': 'mw-category'})

links = [[index, a.get_text()] for index, a in enumerate(tag.find_all('a'), start=1)]

with open('subcategories.csv', 'a') as f:
    writer = csv.writer(f)
    writer.writerows(links)

在 运行 此代码之后 subcategories.csv 的内容将是:

1,Formerly free software
2,Formerly proprietary software
3,Freeware
4,Oracle software
5,Proprietary cross-platform software
6,Proprietary database management systems
7,Proprietary operating systems
8,Proprietary version control systems
9,Proprietary wiki software
10,Shareware
11,VMware
12,Warez

稍微改动一下,把write放在循环下面,每次循环都会写一个link到file

counter = 1
for link in links:
    print ( str(counter) + "  " + link.text)
    counter = counter + 1
    with open('subcategories.csv', 'a') as f:
        f.write(link['href'].split(':')[1]+'\n')

输出:

/wiki/Category:Formerly_proprietary_software
/wiki/Category:Freeware
/wiki/Category:Oracle_software
/wiki/Category:Proprietary_cross-platform_software
/wiki/Category:Proprietary_database_management_systems
/wiki/Category:Proprietary_operating_systems
/wiki/Category:Proprietary_version_control_systems
/wiki/Category:Proprietary_wiki_software
/wiki/Category:Shareware
/wiki/Category:VMware
/wiki/Category:Warez

更好:

# do not need to open file in each loop, just put it above loop
counter = 1
with open('subcategories.csv', 'a') as f:
    for link in links:
        print ( str(counter) + "  " + link.text)
        counter = counter + 1
        f.write(link['href']+'\n')