使用 python 从网页中提取特定单词后的单词

Question

我正在编写一个简单的网络抓取脚本来从网页中提取单个单词。我需要的词经常变化，但在一个永远不变的词之后，所以我可以搜索它。

到目前为止这是我的脚本：

#!/bin/python

import requests
response = requests.get('http://vpnbook.com/freevpn')
print(response.text)

这显然会打印整个 HTML 页面。但我需要的是密码：

<li>All bundles include UDP53, UDP 25000, TCP 80, TCP 443 profile</li>
<li>Username: <strong>vpnbook</strong></li>
<li>Password: <strong>binbd5ar</strong></li>
</ul>

如何将 ONLY 'binbd5ar'（或任何替代它的东西）打印到 STOUT？

Answer 1

password = re.match(r'Password: <strong>(.*?)</strong>',response.text).group(1)

然后去改变它

re.sub(password,newPassword,response.text,max = 1)

Answer 2

from bs4 import BeautifulSoup
import requests

response = requests.get('http://vpnbook.com/freevpn')
soup = BeautifulSoup(response.text, 'html.parser')
pricing = soup.find(id = 'pricing')
first_column = pricing.find('div', {'class': 'one-third'})
for li in first_column.find('ul', {'class': 'disc'}):
    if 'password' in str(li).lower():
        password = li.find('strong').text
print(password)

Answer 3

import re
re.search(r'Password: <strong>(.+)</strong>',response.text).group(1)

Answer 4

您可以使用正则表达式搜索。

"Python offers two different primitive operations based on regular expressions: re.match() checks for a match only at the beginning of the string, while re.search() checks for a match anywhere in the string" link

>>> import re
>>> x = re.search(r"Password: <strong>(?P<pass>\w+)</strong>", response.text)
>>> print x.groupdict()
{'pass': 'binbd5ar'}

使用 python 从网页中提取特定单词后的单词

Extract a word that follows a particular word from a webpage with python

python

scripting

web