beautifulsoup python 中的 FindAll("a")

Question

Python 的新手，谁能解释一下下面代码中的 findAll("a") 是什么意思？我可以用其他字母代替吗？像 g、h、m？ 'a'是指在文章中查找"a"吗？

和href=re.compile("^(/wiki/)((?!:).)*$"))的意思是找到那些名字中有wiki的链接？

from urllib.request import urlopen
from bs4 import BeautifulSoup
import re
html = urlopen("http://en.wikipedia.org/wiki/Kevin_Bacon")
bsObj = BeautifulSoup(html)
for link in bsObj.find("div", {"id":"bodyContent"}).findAll("a",
href=re.compile("^(/wiki/)((?!:).)*$")):
    if 'href' in link.attrs:
        print(link.attrs['href'])

谁能推荐一些学习python 3.6 web scraping 的好书，初学者可以轻松学习？

Answer 1

findAll("a") 表示搜索所有 "a" (anchor) 标签

是的，您可以使用 'h'、'b'、'strong' 和任何其他有效的 html 标记名代替 'a'

您可以了解更多 BeautifulSoup here

此外 re.compile("^(/wiki/)((?!:).)*$")) 将获取所有以 wiki

开头的链接

beautifulsoup python 中的 FindAll("a")

FindAll("a") in beautifulsoup python

beautifulsoup

web-scraping

python-3.6