使用 BeautifulSoup 提取字符串
Extracting string with BeautifulSoup
我在 python 3.4 中使用 BeautifulSoup 如下
soup = BeautifulSoup(urlopen(URL), 'html.parser')
for fraction in soup.findAll("div", { "class" : "eventprice" }):
print(fraction.get_text())
我要提取的数据如下:
<div id="ip_selection983317834" class="eventprice">
1/2
</div>
我已经探索了 fraction.get_div 的多个选项,更改属性等等。这里发生了什么?
只要切换到 requests
就可以了:
from bs4 import BeautifulSoup
import requests
URL = "http://sports.williamhill.com/bet/en-gb/betting/y/5/tm/0/Football.html"
response = requests.get(URL)
soup = BeautifulSoup(response.content, 'html.parser')
for fraction in soup.findAll("div", { "class" : "eventprice" }):
print(fraction.get_text(strip=True))
打印:
1/2
16/5
11/2
8/5
...
5/6
21/10
7/2
我猜这是因为 requests
发送的默认值 headers。就我而言,它们是:
{'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'User-Agent': 'python-requests/2.3.0 CPython/2.7.6 Darwin/14.1.0'}
我在 python 3.4 中使用 BeautifulSoup 如下
soup = BeautifulSoup(urlopen(URL), 'html.parser')
for fraction in soup.findAll("div", { "class" : "eventprice" }):
print(fraction.get_text())
我要提取的数据如下:
<div id="ip_selection983317834" class="eventprice">
1/2
</div>
我已经探索了 fraction.get_div 的多个选项,更改属性等等。这里发生了什么?
只要切换到 requests
就可以了:
from bs4 import BeautifulSoup
import requests
URL = "http://sports.williamhill.com/bet/en-gb/betting/y/5/tm/0/Football.html"
response = requests.get(URL)
soup = BeautifulSoup(response.content, 'html.parser')
for fraction in soup.findAll("div", { "class" : "eventprice" }):
print(fraction.get_text(strip=True))
打印:
1/2
16/5
11/2
8/5
...
5/6
21/10
7/2
我猜这是因为 requests
发送的默认值 headers。就我而言,它们是:
{'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'User-Agent': 'python-requests/2.3.0 CPython/2.7.6 Darwin/14.1.0'}