Python 抓取空标签

Python Scraping empty tag

我在从页面中抓取某些元素时遇到问题: https://tuning-tec.com/mercedes_w164_ml_mklasa_0507_black_led_seq_lpmed0-5789i

代码:

import requests
from bs4 import BeautifulSoup


URL="https://tuning-tec.com/mercedes_w164_ml_mklasa_0507_black_led_seq_lpmed0-5789i"
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser')

title=soup.find(class_="product_cart_title").text
price=soup.find(class_="icon_main_block_price_a")
number=soup.find(class_="product_cart_info").findAll('tr')[1].findAll('td')[1]
description=soup.find(id="tab_a")


print(description)

问题是我想到达:tab_a

这是内部的一个问题

<div align="left" class="product_cart_info" id="charlong_id">
</div>

为空。我怎样才能得到它? 我认为它是关于 js 的。也许页面加载时有一些延迟?

如评论中所述,信息是通过 JavaScript 加载的,因此 BeautifulSoup 看不到它。但是如果你查看 Chrome/Firefox 网络选项卡,你可以看到页面在哪里发出请求:

import re
import requests
from bs4 import BeautifulSoup

url = 'https://tuning-tec.com/mercedes_w164_ml_mklasa_0507_black_led_seq_lpmed0-5789i'
ajax_url = 'https://tuning-tec.com/_template/_show_normal/_show_charlong.php?itemId={}'

soup = BeautifulSoup(requests.get(url).content, 'html.parser')

print(soup.select_one('.product_cart_title').get_text(strip=True))
print(soup.select_one('.icon_main_block_price_a').get_text(strip=True))
print(soup.select_one('td:contains("Symbol") ~ td').get_text(strip=True))

item_id = re.findall(r"ajax_update_stat\('(\d+)'\)", soup.text)[0]
soup2 = BeautifulSoup(requests.get(ajax_url.format(item_id)).content, 'html.parser')

print()

# just print some info:
for tr in soup2.select('tr'):
    print(re.sub(r' {2,}', ' ', tr.select_one('td').get_text(strip=True, separator=' ')))

打印:

MERCEDES W164 ML M-KLASA 05-07 BLACK LED SEQ
1788.62 PLN
LPMED0

PL
Opis
Lampy 
 soczewkowe ze światłem 
 pozycyjnym LED. Z dynamicznym 
 kierunkowskazem. 100% nowe, w komplecie 
 (lewa i prawa). Homologacja: norma E13 - 
 dopuszczone do ruchu.
Szczegóły
Światła pozycyjne: DIODY Kierunkowskaz: DIODY Światła 
 mijania: H9 w 
 zestawie Światła 
 drogowe: H1 w 
 zestawie Regulacja: elektryczna (silniczek znajduje się w 
 komplecie).
LED TUBE LIGHT Dynamic Turn Signal >>

我改了一点描述,我不知道它的好处,但如果你能看的话:

import re
import requests
from bs4 import BeautifulSoup

url = 'https://tuning-tec.com/mercedes_w164_ml_mklasa_0507_black_led_seq_lpmed0-5789i'
ajax_url = 'https://tuning-tec.com/_template/_show_normal/_show_charlong.php?itemId={}'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')


def unwrapElements(soup, elementsToFind):
    elements = soup.find_all(elementsToFind)
    for element in elements:
        element.unwrap()

print(soup.select_one('.product_cart_title').get_text(strip=True))
print(soup.select_one('.icon_main_block_price_a').get_text(strip=True))
print(soup.select_one('td:contains("Symbol") ~ td').get_text(strip=True))

item_id = re.findall(r"ajax_update_stat\('(\d+)'\)", soup.text)[0]
soup2 = BeautifulSoup(requests.get(ajax_url.format(item_id)).content, 'html.parser')
description=soup2.findAll('tr')[2].findAll('td')[1]
description.append(soup2.findAll('tr')[4].findAll('td')[1])

unwrapElements(description, "td")
unwrapElements(description, "font")
unwrapElements(description, "span")


print(description)

我只需要这些英文描述元素。会好吗?

无论如何,谢谢你的帮助!!

只有一个人认为我不知道他为什么不全部删除:/