Beautifulsoup 不返回子元素
Beautifulsoup not returning child elements
我试过一百万种不同的方法,但无法弄清楚为什么 Beautifulsoup 和我所有的前任一样不可靠table。
我只是想将 table 复制到 pandas 数据框。 table.
中大约有 280 行
这是 url:
https://www.brilliantearth.com/design-your-own-engagement-ring/?sid=3755106&dc=
这是我的部分代码不起作用:
with requests.Session() as s:
url = "https://www.brilliantearth.com/design-your-own-engagement-ring/?sid=3755106&dc="
r = s.get(url, headers=req_headers)
#add contents of urls to soup variable from each url
soup = BeautifulSoup(r.content, 'lxml')
rows = soup.find_all("div", {"id": "diamonds_search_table"})
rows
这是 url 中的 table 是:
接下来我可以尝试什么?
你可以使用selenium
来解析html
。你可以试试:
from bs4 import BeautifulSoup
from selenium import webdriver
driver = webdriver.Firefox()
driver.get('https://www.brilliantearth.com/design-your-own-engagement-ring/?sid=3755106&dc=')
html = driver.page_source
soup = BeautifulSoup(html)
rows = soup.find_all("div", {"id": "diamonds_search_table"})
print(rows)
您将获得如下所有行:
[<div class="search-table" id="diamonds_search_table" style="position: relative; height: 34000px;">
<div class="inner item" data-have="true" data-position="0" style="position: absolute; width: 100%; height: 34px;top:0px;"><a class="td-n2" href="/rings/cyorings/view_diamond/9361809/?sid=3755106&first=diamond&show_diamond_tab=true"></a><table border="0" cellpadding="0" cellspacing="0" class="table-striped table-hover search-result-table" width="100%"><tbody><tr class="search-item"><td data-id="9361809" onclick="dtl.stop_jump();" scope="col" width="7%"><div class="checkbox checkbox-ty4"><label><input class="hidden"/><span class="sr-only">checkbox</span><i class="icons-checkbox"></i></label></div></td><td scope="col" width="9%">Round</td><td scope="col" width="9%">0.30</td><td scope="col" width="8%">H</td><td scope="col" width="8%">SI2</td><td scope="col" width="12%">Very Good</td><td scope="col" width="8%">GIA</td><td scope="col" width="12%">Botswana Sort</td><td class="width_ratio_hide" scope="col" width="8%">1</td><td scope="col" width="10%">0</td><td scope="col" width="7%"><span class="view">View</span></td></tr></tbody></table></div><div class="inner item" data-have="true" data-position="34" style="position: absolute; width: 100%; height: 34px;top:34px;"><a class="td-n2" href="/rings/cyorings/view_diamond/9391074/?sid=3755106&first=diamond&show_diamond_tab=true"></a><table border="0" cellpadding="0" cellspacing="0" class="table-striped table-hover search-result-table" width="100%"><tbody><tr class="search-item"><td data-id="9391074"
and so on...........]
数据通过JavaScript动态加载。你可以使用requests
模块来模拟它。
例如:
import json
import requests
search_parameters = {
'shapes': "Round",
'cuts': "Fair,Good,Very Good,Ideal,Super Ideal",
'colors': "J,I,H,G,F,E,D",
'clarities': "SI2,SI1,VS2,VS1,VVS2,VVS1,IF,FL",
'polishes': "Good,Very Good,Excellent",
'symmetries': "Good,Very Good,Excellent",
'fluorescences': "Very Strong,Strong,Medium,Faint,None",
'min_carat': "0.25",
'max_carat': "11.58",
'min_table': "50.00",
'max_table': "86.00",
'min_depth': "46.20",
'max_depth': "629.00",
'min_price': "420",
'max_price': "1258930",
'stock_number': "",
'row': "0",
'page': "1",
'requestedDataSize': "200",
'order_by': "price",
'order_method': "asc",
'currency': "$",
'has_v360_video': "",
'dedicated': "",
'sid': "",
'min_ratio': "1.00",
'max_ratio': "2.75",
'shipping_day': "",
'MIN_PRICE': "420",
'MAX_PRICE': "1258930",
'MIN_CARAT': "0.25",
'MAX_CARAT': "11.58",
'MIN_TABLE': "45",
'MAX_TABLE': "86",
'MIN_DEPTH': "46.2",
'MAX_DEPTH': "629"
}
data = requests.get('https://www.brilliantearth.com/loose-diamonds/list/', params=search_parameters).json()
# uncomment this to print all data:
# print(json.dumps(data, indent=4))
for d in data['diamonds']:
print('{:<30} {:<15} {}'.format(d['title'], d['cut'], d['price']))
打印:
0.30 Carat Round Diamond Very Good 420
0.30 Carat Round Diamond Very Good 420
0.30 Carat Round Diamond Ideal 430
0.30 Carat Round Diamond Ideal 430
0.30 Carat Round Diamond Good 430
0.30 Carat Round Diamond Ideal 430
0.30 Carat Round Diamond Very Good 430
0.25 Carat Round Diamond Super Ideal 430
0.30 Carat Round Diamond Very Good 430
0.32 Carat Round Diamond Ideal 430
... and so on.
我试过一百万种不同的方法,但无法弄清楚为什么 Beautifulsoup 和我所有的前任一样不可靠table。
我只是想将 table 复制到 pandas 数据框。 table.
中大约有 280 行这是 url:
https://www.brilliantearth.com/design-your-own-engagement-ring/?sid=3755106&dc=
这是我的部分代码不起作用:
with requests.Session() as s:
url = "https://www.brilliantearth.com/design-your-own-engagement-ring/?sid=3755106&dc="
r = s.get(url, headers=req_headers)
#add contents of urls to soup variable from each url
soup = BeautifulSoup(r.content, 'lxml')
rows = soup.find_all("div", {"id": "diamonds_search_table"})
rows
这是 url 中的 table 是:
接下来我可以尝试什么?
你可以使用selenium
来解析html
。你可以试试:
from bs4 import BeautifulSoup
from selenium import webdriver
driver = webdriver.Firefox()
driver.get('https://www.brilliantearth.com/design-your-own-engagement-ring/?sid=3755106&dc=')
html = driver.page_source
soup = BeautifulSoup(html)
rows = soup.find_all("div", {"id": "diamonds_search_table"})
print(rows)
您将获得如下所有行:
[<div class="search-table" id="diamonds_search_table" style="position: relative; height: 34000px;">
<div class="inner item" data-have="true" data-position="0" style="position: absolute; width: 100%; height: 34px;top:0px;"><a class="td-n2" href="/rings/cyorings/view_diamond/9361809/?sid=3755106&first=diamond&show_diamond_tab=true"></a><table border="0" cellpadding="0" cellspacing="0" class="table-striped table-hover search-result-table" width="100%"><tbody><tr class="search-item"><td data-id="9361809" onclick="dtl.stop_jump();" scope="col" width="7%"><div class="checkbox checkbox-ty4"><label><input class="hidden"/><span class="sr-only">checkbox</span><i class="icons-checkbox"></i></label></div></td><td scope="col" width="9%">Round</td><td scope="col" width="9%">0.30</td><td scope="col" width="8%">H</td><td scope="col" width="8%">SI2</td><td scope="col" width="12%">Very Good</td><td scope="col" width="8%">GIA</td><td scope="col" width="12%">Botswana Sort</td><td class="width_ratio_hide" scope="col" width="8%">1</td><td scope="col" width="10%">0</td><td scope="col" width="7%"><span class="view">View</span></td></tr></tbody></table></div><div class="inner item" data-have="true" data-position="34" style="position: absolute; width: 100%; height: 34px;top:34px;"><a class="td-n2" href="/rings/cyorings/view_diamond/9391074/?sid=3755106&first=diamond&show_diamond_tab=true"></a><table border="0" cellpadding="0" cellspacing="0" class="table-striped table-hover search-result-table" width="100%"><tbody><tr class="search-item"><td data-id="9391074"
and so on...........]
数据通过JavaScript动态加载。你可以使用requests
模块来模拟它。
例如:
import json
import requests
search_parameters = {
'shapes': "Round",
'cuts': "Fair,Good,Very Good,Ideal,Super Ideal",
'colors': "J,I,H,G,F,E,D",
'clarities': "SI2,SI1,VS2,VS1,VVS2,VVS1,IF,FL",
'polishes': "Good,Very Good,Excellent",
'symmetries': "Good,Very Good,Excellent",
'fluorescences': "Very Strong,Strong,Medium,Faint,None",
'min_carat': "0.25",
'max_carat': "11.58",
'min_table': "50.00",
'max_table': "86.00",
'min_depth': "46.20",
'max_depth': "629.00",
'min_price': "420",
'max_price': "1258930",
'stock_number': "",
'row': "0",
'page': "1",
'requestedDataSize': "200",
'order_by': "price",
'order_method': "asc",
'currency': "$",
'has_v360_video': "",
'dedicated': "",
'sid': "",
'min_ratio': "1.00",
'max_ratio': "2.75",
'shipping_day': "",
'MIN_PRICE': "420",
'MAX_PRICE': "1258930",
'MIN_CARAT': "0.25",
'MAX_CARAT': "11.58",
'MIN_TABLE': "45",
'MAX_TABLE': "86",
'MIN_DEPTH': "46.2",
'MAX_DEPTH': "629"
}
data = requests.get('https://www.brilliantearth.com/loose-diamonds/list/', params=search_parameters).json()
# uncomment this to print all data:
# print(json.dumps(data, indent=4))
for d in data['diamonds']:
print('{:<30} {:<15} {}'.format(d['title'], d['cut'], d['price']))
打印:
0.30 Carat Round Diamond Very Good 420
0.30 Carat Round Diamond Very Good 420
0.30 Carat Round Diamond Ideal 430
0.30 Carat Round Diamond Ideal 430
0.30 Carat Round Diamond Good 430
0.30 Carat Round Diamond Ideal 430
0.30 Carat Round Diamond Very Good 430
0.25 Carat Round Diamond Super Ideal 430
0.30 Carat Round Diamond Very Good 430
0.32 Carat Round Diamond Ideal 430
... and so on.