Beautifulsoup 不返回子元素

Beautifulsoup not returning child elements

我试过一百万种不同的方法,但无法弄清楚为什么 Beautifulsoup 和我所有的前任一样不可靠table。

我只是想将 table 复制到 pandas 数据框。 table.

中大约有 280 行

这是 url:

https://www.brilliantearth.com/design-your-own-engagement-ring/?sid=3755106&dc=

这是我的部分代码不起作用:

with requests.Session() as s:
    url = "https://www.brilliantearth.com/design-your-own-engagement-ring/?sid=3755106&dc="
    r = s.get(url, headers=req_headers)

#add contents of urls to soup variable from each url
soup = BeautifulSoup(r.content, 'lxml')
rows = soup.find_all("div", {"id": "diamonds_search_table"})
rows

这是 url 中的 table 是:

接下来我可以尝试什么?

你可以使用selenium来解析html。你可以试试:

from bs4 import BeautifulSoup
from selenium import webdriver
driver = webdriver.Firefox()
driver.get('https://www.brilliantearth.com/design-your-own-engagement-ring/?sid=3755106&dc=')

html = driver.page_source
soup = BeautifulSoup(html)


rows = soup.find_all("div", {"id": "diamonds_search_table"})
print(rows)

您将获得如下所有行:

[<div class="search-table" id="diamonds_search_table" style="position: relative; height: 34000px;">
<div class="inner item" data-have="true" data-position="0" style="position: absolute; width: 100%; height: 34px;top:0px;"><a class="td-n2" href="/rings/cyorings/view_diamond/9361809/?sid=3755106&amp;first=diamond&amp;show_diamond_tab=true"></a><table border="0" cellpadding="0" cellspacing="0" class="table-striped table-hover search-result-table" width="100%"><tbody><tr class="search-item"><td data-id="9361809" onclick="dtl.stop_jump();" scope="col" width="7%"><div class="checkbox checkbox-ty4"><label><input class="hidden"/><span class="sr-only">checkbox</span><i class="icons-checkbox"></i></label></div></td><td scope="col" width="9%">Round</td><td scope="col" width="9%">0.30</td><td scope="col" width="8%">H</td><td scope="col" width="8%">SI2</td><td scope="col" width="12%">Very Good</td><td scope="col" width="8%">GIA</td><td scope="col" width="12%">Botswana Sort</td><td class="width_ratio_hide" scope="col" width="8%">1</td><td scope="col" width="10%">0</td><td scope="col" width="7%"><span class="view">View</span></td></tr></tbody></table></div><div class="inner item" data-have="true" data-position="34" style="position: absolute; width: 100%; height: 34px;top:34px;"><a class="td-n2" href="/rings/cyorings/view_diamond/9391074/?sid=3755106&amp;first=diamond&amp;show_diamond_tab=true"></a><table border="0" cellpadding="0" cellspacing="0" class="table-striped table-hover search-result-table" width="100%"><tbody><tr class="search-item"><td data-id="9391074"


and so on...........]

数据通过JavaScript动态加载。你可以使用requests模块来模拟它。

例如:

import json
import requests


search_parameters = {
'shapes':  "Round",
'cuts':    "Fair,Good,Very Good,Ideal,Super Ideal",
'colors':  "J,I,H,G,F,E,D",
'clarities':   "SI2,SI1,VS2,VS1,VVS2,VVS1,IF,FL",
'polishes':    "Good,Very Good,Excellent",
'symmetries':  "Good,Very Good,Excellent",
'fluorescences':   "Very Strong,Strong,Medium,Faint,None",
'min_carat':   "0.25",
'max_carat':  "11.58",
'min_table':   "50.00",
'max_table':   "86.00",
'min_depth':   "46.20",
'max_depth':   "629.00",
'min_price':   "420",
'max_price':   "1258930",
'stock_number':    "",
'row': "0",
'page':    "1",
'requestedDataSize':   "200",
'order_by':    "price",
'order_method':    "asc",
'currency':    "$",
'has_v360_video':  "",
'dedicated':   "",
'sid': "",
'min_ratio':   "1.00",
'max_ratio':   "2.75",
'shipping_day':    "",
'MIN_PRICE':   "420",
'MAX_PRICE':   "1258930",
'MIN_CARAT':   "0.25",
'MAX_CARAT':  "11.58",
'MIN_TABLE':   "45",
'MAX_TABLE':   "86",
'MIN_DEPTH':   "46.2",
'MAX_DEPTH':   "629"
}

data = requests.get('https://www.brilliantearth.com/loose-diamonds/list/', params=search_parameters).json()

# uncomment this to print all data:
# print(json.dumps(data, indent=4))

for d in data['diamonds']:
    print('{:<30} {:<15} {}'.format(d['title'], d['cut'], d['price']))

打印:

0.30 Carat Round Diamond       Very Good       420
0.30 Carat Round Diamond       Very Good       420
0.30 Carat Round Diamond       Ideal           430
0.30 Carat Round Diamond       Ideal           430
0.30 Carat Round Diamond       Good            430
0.30 Carat Round Diamond       Ideal           430
0.30 Carat Round Diamond       Very Good       430
0.25 Carat Round Diamond       Super Ideal     430
0.30 Carat Round Diamond       Very Good       430
0.32 Carat Round Diamond       Ideal           430

... and so on.