BS4 从 class 以奇怪的名字获取信息
BS4 get info from class with weird name
从 the Steam Community market search 得到这个奇怪的 html:
<span class=\"normal_price\">.69 USD<\/span>
如何用bs4提取数据?这不起作用:
soup.find("span", attrs={"class": "\"normal_price\""})
您在 JSON 字符串中嵌入了 HTML, 必须 对引号进行转义。与其手动提取该数据,不如先解析 JSON:
import json
data = json.loads(json_data)
html = data['results_html']
如果您使用的是 requests
库,可以为您解码响应:
response = requests.get('http://steamcommunity.com/market/search/render/?query=appid:730&start=0&count=3¤cy=3&l=english&cc=pt')
html = response.json()['results_html']
之后你可以用 BeautifulSoup 解析这个就好了:
>>> import requests
>>> from bs4 import BeautifulSoup
>>> html = requests.get('http://steamcommunity.com/market/search/render/?query=appid:730&start=0&count=3¤cy=3&l=english&cc=pt').json()['results_html']
>>> BeautifulSoup(html, 'lxml').find('span', class_='normal_price').span
<span class="normal_price">.69 USD</span>
从 the Steam Community market search 得到这个奇怪的 html:
<span class=\"normal_price\">.69 USD<\/span>
如何用bs4提取数据?这不起作用:
soup.find("span", attrs={"class": "\"normal_price\""})
您在 JSON 字符串中嵌入了 HTML, 必须 对引号进行转义。与其手动提取该数据,不如先解析 JSON:
import json
data = json.loads(json_data)
html = data['results_html']
如果您使用的是 requests
库,可以为您解码响应:
response = requests.get('http://steamcommunity.com/market/search/render/?query=appid:730&start=0&count=3¤cy=3&l=english&cc=pt')
html = response.json()['results_html']
之后你可以用 BeautifulSoup 解析这个就好了:
>>> import requests
>>> from bs4 import BeautifulSoup
>>> html = requests.get('http://steamcommunity.com/market/search/render/?query=appid:730&start=0&count=3¤cy=3&l=english&cc=pt').json()['results_html']
>>> BeautifulSoup(html, 'lxml').find('span', class_='normal_price').span
<span class="normal_price">.69 USD</span>