改进和简化 python BeautifulSoup 代码

Improving and Simplifying python BeautifulSoup code

我有这段代码使用 BeautifulSoup 从网站收集一些数据

import requests
from bs4 import BeautifulSoup

url = "http://hearthstone.gamepedia.com/Patches"
page = requests.get(url)
soup = BeautifulSoup(page.content,"html.parser")

variable = soup.find('div',{"id":"mw-content-text"})
variable = variable.find_all('ul')[2]
variable = variable.find('li')
variable = variable.find_all('a')[1]

print(variable.text)

输出应该是:

Patch 7.0.0.15590

按此顺序,我可以准确找到我想要的 a 标签。

我怎样才能把它变成一行代码来简化它?

Variable = harsoup.find('div',{"id":"mw-content-text"}).find_all('ul')[2].find('li').find_all('a')[1]

我想实现这样的目标,但它似乎确实以同样的方式工作。

soup.find_all(href=re.compile(r'/Patch_'))

输出:

[<a href="/Patch_7.0.0.15590" title="Patch 7.0.0.15590">Patch 7.0.0.15590</a>,
 <a href="/Patch_6.2.0.15300" title="Patch 6.2.0.15300">Patch 6.2.0.15300</a>,
 <a href="/Patch_6.2.0.15181" title="Patch 6.2.0.15181">Patch 6.2.0.15181</a>,
 <a href="/Patch_6.1.3.14830" title="Patch 6.1.3.14830">Patch 6.1.3.14830</a>,
 <a href="/Patch_6.1.1.14406" title="Patch 6.1.1.14406">Patch 6.1.1.14406</a>,
 <a href="/Patch_6.0.0.13921" title="Patch 6.0.0.13921">Patch 6.0.0.13921</a>,
 <a href="/Patch_5.2.2.13807" title="Patch 5.2.2.13807">Patch 5.2.2.13807</a>,
 <a href="/Patch_5.2.0.13740" title="Patch 5.2.0.13740">Patch 5.2.0.13740</a>,
 <a href="/Patch_5.2.0.13714" title="Patch 5.2.0.13714">Patch 5.2.0.13714</a>,
 <a href="/Patch_5.2.0.13619" title="Patch 5.2.0.13619">Patch 5.2.0.13619</a>,
 <a href="/Patch_5.0.0.13030" title="Patch 5.0.0.13030">Patch 5.0.0.13030</a>,
 <a href="/Patch_5.0.0.12574" title="Patch 5.0.0.12574">Patch 5.0.0.12574</a>,
 <a href="/Patch_4.3.0.12266" title="Patch 4.3.0.12266">Patch 4.3.0.12266</a>,
 <a href="/Patch_4.2.0.12051" title="Patch 4.2.0.12051">Patch 4.2.0.12051</a>,
 <a href="/Patch_4.1.0.10956" title="Patch 4.1.0.10956">Patch 4.1.0.10956</a>,
 <a href="/Patch_4.0.0.10833" title="Patch 4.0.0.10833">Patch 4.0.0.10833 - The League of Explorers</a>,
 <a href="/Patch_3.2.0.10604" title="Patch 3.2.0.10604">Patch 3.2.0.10604</a>,
 <a href="/Patch_3.1.0.10357" title="Patch 3.1.0.10357">Patch 3.1.0.10357</a>,
 <a href="/Patch_3.0.0.9786" title="Patch 3.0.0.9786">Patch 3.0.0.9786 - The Grand Tournament Draws Near</a>,
 <a href="/Patch_2.8.0.9554" title="Patch 2.8.0.9554">Patch 2.8.0.9554</a>,
 <a href="/Patch_2.7.0.9166" title="Patch 2.7.0.9166">Patch 2.7.0.9166</a>,
 <a href="/Patch_2.6.0.8834" title="Patch 2.6.0.8834">Patch 2.6.0.8834</a>,

使用re过滤你想要的标签。

有五个filters可用于find()find_all():

  1. 一个字符串
  2. 正则表达式
  3. 一个列表
  4. 正确
  5. 一个函数