BeautifulSoup returns None 但元素确实存在
BeautifulSoup returns None but the element definetely exists
我试图用 python 的请求和 BeautifulSoup 库来抓取 Amazon.com,但我偶然发现了问题。我知道我可以使用 Selenium 并且我已经尝试过并且它有效但我仍然很好奇为什么会发生这种情况以及是否有解决方案。
这是我的代码:
# Searching python on Amazon
url = "https://www.amazon.com/s?k=Python"
# Deceiving Amazon that I am trying to reach them from a browser
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36'
}
page = requests.get(url, headers=headers)
soup = BeautifulSoup(page.content, "html.parser")
# Trying to get the element I need but prints "None"
print(soup.find("div", class_="s-main-slot s-result-list s-search-results sg-row"))
提前致谢。
使用 Selenium Python 的替代方法也解决了问题
使用selenium.webdriver
,您就有了适合自己的浏览器。例如,下面使用 Google-Chrome
webdriver.
然后您使用 driver.page_source
获得 html 结果页面。
from selenium.webdriver import Chrome
from selenium.webdriver import ChromeOptions
from bs4 import BeautifulSoup as Soup
options = ChromeOptions()
options.add_argument("headless") # to hide window in 'background'
driver = Chrome(options=options)
driver.get("https://www.amazon.com/s?k=Python")
html = driver.page_source
soup = Soup(html)
soup.find("div", class_="s-main-slot s-result-list s-search-results sg-row")
产出
<div class="s-main-slot s-result-list s-search-results sg-row">
<div class="sg-col-20-of-24 s-result-item s-asin sg-col-0-of-12 sg-col-28-of-32 sg-col-16-of-20 sg-col sg-col-32-of-36 sg-col-12-of-16 sg-col-24-of-28" data-asin="1593279280" data-component-id="6" data-component-type="s-search-result" data-index="0" data-uuid="c5f5837a-1f2e-4243-a520-a1936aac014e"><div class="sg-col-inner">
... etc.
Selenium python 安装 here
将解析器更改为 lxml
它应该可以工作。
url = "https://www.amazon.com/s?k=Python"
# Deceiving Amazon that I am trying to reach them from a browser
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36'
}
page = requests.get(url, headers=headers)
soup = BeautifulSoup(page.content, "lxml")
# Trying to get the element I need but prints "None"
print(soup.find("div", class_="s-main-slot s-result-list s-search-results sg-row"))
我的控制台输出:
<div class="s-main-slot s-result-list s-search-results sg-row">
<div class="sg-col-20-of-24 s-result-item s-asin sg-col-0-of-12 sg-col-28-of-32 sg-col-16-of-20 sg-col sg-col-32-of-36 sg-col-12-of-16 sg-col-24-of-28" data-asin="1593279280" data-component-type="s-search-result" data-index="0" data-uuid="ae6080d7-b07e-4558-b38f-613931584787"><div class="sg-col-inner">
<span cel_widget_id="MAIN-SEARCH_RESULTS" class="celwidget slot=MAIN template=SEARCH_RESULTS widgetId=search-results">
<div class="s-include-content-margin s-border-bottom s-latency-cf-section">
<div class="a-section a-spacing-medium">
<div class="sg-row">
<div class="a-section a-spacing-micro s-min-height-small">
<a class="a-link-normal" href="/gp/bestsellers/books/285856/ref=sr_bs_0_285856_1">
<span class="rush-component" data-component-props='{"badgeType":"best-seller","asin":"1593279280"}' data-component-type="s-status-badge-component">
<div class="a-row a-badge-region"><span aria-labelledby="1593279280-best-seller-label 1593279280-best-seller-supplementary" class="a-badge" data-a-badge-supplementary-position="right" data-a-badge-type="status" id="1593279280-best-seller" tabindex="0"><span aria-hidden="true" class="a-badge-label" data-a-badge-color="sx-orange" id="1593279280-best-seller-label"><span class="a-badge-label-inner a-text-ellipsis">
<span class="a-badge-text" data-a-badge-color="sx-cloud">Best Seller</span>
</span></span><span aria-hidden="true" class="a-badge-supplementary-text a-text-ellipsis" id="1593279280-best-seller-supplementary">in Python Programming</span></span></div>
</span>
</a>
</div>
</div>
<div class="sg-row">
<div class="sg-col-4-of-24 sg-col-4-of-12 sg-col-4-of-36 sg-col-4-of-28 sg-col-4-of-16 sg-col sg-col-4-of-20 sg-col-4-of-32"><div class="sg-col-inner">
<div class="a-section a-spacing-none">
<span class="rush-component" data-component-type="s-product-image">
<a class="a-link-normal" href="/Python-Crash-Course-2nd-Edition/dp/1593279280/ref=sr_1_1?dchild=1&keywords=Python&qid=1592423942&sr=8-1">
<div class="a-section aok-relative s-image-fixed-height">
<img alt="Python Crash Course, 2nd Edition: A Hands-On, Project-Based Introduction to Programming" class="s-image" data-image-index="0" data-image-latency="s-product-image" data-image-load="" data-image-source-density="1" src="https://m.media-amazon.com/images/I/81f8XACISAL._AC_UY218_.jpg" srcset="https://m.media-amazon.com/images/I/81f8XACISAL._AC_UY218_.jpg 1x, https://m.media-amazon.com/images/I/81f8XACISAL._AC_UY327_QL65_.jpg 1.5x, https://m.media-amazon.com/images/I/81f8XACISAL._AC_UY436_QL65_.jpg 2x, https://m.media-amazon.com/images/I/81f8XACISAL._AC_UY545_QL65_.jpg 2.5x, https://m.media-amazon.com/images/I/81f8XACISAL._AC_UY654_QL65_.jpg 3x"/>
</div>
</a>
</span>
</div>
</div></div>
<div class="sg-col-4-of-12 sg-col-8-of-16 sg-col-16-of-24 sg-col-12-of-20 sg-col-24-of-32 sg-col sg-col-28-of-36 sg-col-20-of-28"><div class="sg-col-inner">
<div class="sg-row">
<div class="sg-col-4-of-12 sg-col-8-of-16 sg-col-12-of-32 sg-col-12-of-20 sg-col-12-of-36 sg-col sg-col-12-of-24 sg-col-12-of-28"><div class="sg-col-inner">
<div class="a-section a-spacing-none">
<h2 class="a-size-mini a-spacing-none a-color-base s-line-clamp-2">
<a class="a-link-normal a-text-normal" href="/Python-Crash-Course-2nd-Edition/dp/1593279280/ref=sr_1_1?dchild=1&keywords=Python&qid=1592423942&sr=8-1">
<span class="a-size-medium a-color-base a-text-normal" dir="auto">Python Crash Course, 2nd Edition: A Hands-On, Project-Based Introduction to Programming</span>
</a>
</h2>
<div class="a-row a-size-base a-color-secondary"><span class="a-size-base" dir="auto">by </span>
<a class="a-size-base a-link-normal" href="/Eric-Matthes/e/B01DPU378I?ref=sr_ntt_srch_lnk_1&qid=1592423942&sr=8-1">
Eric Matthes
</a>
<span class="a-letter-space"></span><span class="a-size-base a-color-secondary" dir="auto"> | </span><span class="a-letter-space"></span><span class="a-size-base a-color-secondary a-text-normal" dir="auto">May 3, 2019</span></div>
</div>
<div class="a-section a-spacing-none a-spacing-top-micro">
<div class="a-row a-size-small">
<span aria-label="4.6 out of 5 stars">
<span class="a-declarative" data-a-popover='{"max-width":"700","closeButton":false,"position":"triggerBottom","url":"/review/widgets/average-customer-review/popover/ref=acr_search__popover?ie=UTF8&asin=1593279280&ref=acr_search__popover&contextId=search"}' data-action="a-popover">
<a class="a-popover-trigger a-declarative" href="javascript:void(0)"><i class="a-icon a-icon-star-small a-star-small-4-5 aok-align-bottom"><span class="a-icon-alt">4.6 out of 5 stars</span></i><i class="a-icon a-icon-popover"></i></a>
</span>
</span>
<span aria-label="555">
<a class="a-link-normal" href="/Python-Crash-Course-2nd-Edition/dp/1593279280/ref=sr_1_1?dchild=1&keywords=Python&qid=1592423942&sr=8-1#customerReviews">
<span class="a-size-base" dir="auto">555</span>
</a>
</span>
</div>
</div>
</div></div>
</div>
<div class="sg-row">
<div class="sg-col-4-of-24 sg-col-4-of-12 sg-col-4-of-36 sg-col-4-of-28 sg-col-4-of-16 sg-col sg-col-4-of-20 sg-col-4-of-32"><div class="sg-col-inner">
<div class="a-section a-spacing-none a-spacing-top-small">
<div class="a-row a-size-base a-color-base">
<a class="a-size-base a-link-normal a-text-bold" href="/Python-Crash-Course-2nd-Edition/dp/1593279280/ref=sr_1_1?dchild=1&keywords=Python&qid=1592423942&sr=8-1">
Paperback
</a>
</div><div class="a-row a-size-base a-color-base"><div class="a-row">
<a class="a-size-base a-link-normal a-text-normal" href="/Python-Crash-Course-2nd-Edition/dp/1593279280/ref=sr_1_1?dchild=1&keywords=Python&qid=1592423942&sr=8-1">
<span class="a-price" data-a-color="base" data-a-size="l"><span class="a-offscreen">.99</span><span aria-hidden="true"><span class="a-price-symbol">$</span><span class="a-price-whole">22<span class="a-price-decimal">.</span></span><span class="a-price-fraction">99</span></span></span>
<span class="a-price a-text-price" data-a-color="secondary" data-a-size="b" data-a-strike="true"><span class="a-offscreen">.95</span><span aria-hidden="true">.95</span></span>
</a>
</div></div><div class="a-row a-size-small a-color-secondary"><span dir="auto">Get 3 for the price of 2</span></div>
</div>
<div class="a-section a-spacing-none a-spacing-top-micro">
<div class="a-row a-size-base a-color-secondary s-align-children-center"><span class="a-size-small a-color-secondary" dir="auto">Ships to United Kingdom</span></div>
</div>
<div class="a-section a-spacing-none a-spacing-top-mini">
<div class="a-row a-size-base a-color-secondary"><span class="a-size-base a-color-secondary" dir="auto">More Buying Choices</span><br/><span class="a-color-base" dir="auto">.82</span><span class="a-letter-space"></span>
<a class="a-link-normal" href="/gp/offer-listing/1593279280/ref=sr_1_1?keywords=Python&qid=1592423942&sr=8-1&dchild=1">
(39 used & new offers)
</a>
</div>
</div>
<div class="a-section a-spacing-none a-spacing-top-mini">
<div class="a-row"><div class="a-row a-spacing-mini"><hr aria-hidden="true" class="a-spacing-mini a-divider-normal"/><div class="a-row a-size-base a-color-base">
<a class="a-size-base a-link-normal a-text-bold" href="/Python-Crash-Course-Eric-Matthes-ebook/dp/B07J4521M3/ref=sr_1_1?keywords=Python&qid=1592423942&sr=8-1">
Kindle
</a>
</div><div class="a-row a-size-base a-color-base"><div class="a-row">
<a class="a-size-base a-link-normal a-text-normal" href="/Python-Crash-Course-Eric-Matthes-ebook/dp/B07J4521M3/ref=sr_1_1?keywords=Python&qid=1592423942&sr=8-1">
<span class="a-price" data-a-color="base" data-a-size="l"><span class="a-offscreen">.99</span><span aria-hidden="true"><span class="a-price-symbol">$</span><span class="a-price-whole">23<span class="a-price-decimal">.</span></span><span class="a-price-fraction">99</span></span></span>
<span class="a-price a-text-price" data-a-color="secondary" data-a-size="b" data-a-strike="true"><span class="a-offscreen">.95</span><span aria-hidden="true">.95</span></span>
</a>
</div></div></div></div>
</div>
</div></div>
<div class="sg-col-4-of-12 sg-col-8-of-28 sg-col-4-of-16 sg-col-8-of-32 sg-col sg-col-8-of-20 sg-col-8-of-36 sg-col-8-of-24"><div class="sg-col-inner">
</div></div>
</div>
<div class="sg-row">
<div class="sg-col-20-of-24 sg-col-28-of-32 sg-col-16-of-20 sg-col sg-col-32-of-36 sg-col-8-of-12 sg-col-12-of-16 sg-col-24-of-28"><div class="sg-col-inner">
</div></div>
</div>
<div class="sg-row">
<div class="sg-col-20-of-24 sg-col-28-of-32 sg-col-16-of-20 sg-col sg-col-32-of-36 sg-col-8-of-12 sg-col-12-of-16 sg-col-24-of-28"><div class="sg-col-inner">
</div></div>
</div>
</div></div>
</div>
</div>
</div>
</span>
</div></div>
<div class="sg-col-20-of-24 s-result-item s-asin sg-col-0-of-12 sg-col-28-of-32 sg-col-16-of-20 sg-col sg-col-32-of-36 sg-col-12-of-16 sg-col-24-of-28" data-asin="1449355730" data-component-type="s-search-result" data-index="1" data-uuid="047b9c10-2a93-4895-97f7-83778651c3f6"><div class="sg-col-inner">
<span cel_widget_id="MAIN-SEARCH_RESULTS" class="celwidget slot=MAIN template=SEARCH_RESULTS widgetId=search-results">
<div class="s-include-content-margin s-border-bottom s-latency-cf-section">
<div class="a-section a-spacing-medium">
<div class="sg-row">
<div class="a-section a-spacing-micro s-min-height-small">
<a class="a-link-normal" href="/gp/bestsellers/books/132561011/ref=sr_bs_1_132561011_1">
<span class="rush-component" data-component-props='{"badgeType":"best-seller","asin":"1449355730"}' data-component-type="s-status-badge-component">
<div class="a-row a-badge-region"><span aria-labelledby="1449355730-best-seller-label 1449355730-best-seller-supplementary" class="a-badge" data-a-badge-supplementary-position="right" data-a-badge-type="status" id="1449355730-best-seller" tabindex="0"><span aria-hidden="true" class="a-badge-label" data-a-badge-color="sx-orange" id="1449355730-best-seller-label"><span class="a-badge-label-inner a-text-ellipsis">
<span class="a-badge-text" data-a-badge-color="sx-cloud">Best Seller</span>
</span></span><span aria-hidden="true" class="a-badge-supplementary-text a-text-ellipsis" id="1449355730-best-seller-supplementary">in Functional Software Programming</span></span></div>
</span>
</a>
</div>
</div>
使用 requests
和 BeautifulSoup
的正确解决方案是:
import requests
from bs4 import BeautifulSoup as bs
headers = {
'authority': 'www.amazon.com',
'cache-control': 'max-age=0',
'rtt': '300',
'downlink': '1.35',
'ect': '3g',
'sec-ch-ua': '"Google Chrome"; v="83"',
'sec-ch-ua-mobile': '?0',
'upgrade-insecure-requests': '1',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.106 Safari/537.36',
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'sec-fetch-site': 'none',
'sec-fetch-mode': 'navigate',
'sec-fetch-user': '?1',
'sec-fetch-dest': 'document',
'accept-language': 'en-GB,en-US;q=0.9,en;q=0.8',
'cookie': 'aws-priv=eyJ2IjoxLCJldSI6MCwic3QiOjB9; session-id=139-7350741-1081713; ubid-main=135-9894765-6184621; lc-main=en_US; s_fid=0A4730DDD06B62E4-1DB478AB62143F35; regStatus=pre-register; x-main=hd2N9IEBuVL7il1dbkhEEHTQSf4Q7uviwjc2eikr0hRGGOyI2RYIiRsk3GvDKLSx; at-main=Atza|IwEBIJdoAZ4Y6j2IIGvC29t1ha634aK-p2kAl8rHhQRCSGMSU_nwQvM6fakAbYEjpVLPU4Jj0TwKvX70d6QnlouKPh0QwpHJG8rHUNVb-gmhS9shHM8fCJk45r1XW2FOSpLoM1iAO9kYIpOoW2M5We9xfdqlLuQBB-D5fQeO5Vqew4RnHesPNZuF4DQNlcqL7wrGjDY1JQKzlzARfATAuwaCy4jMD5bNmxpcWtTgNGrTtLpGv1Y-4Mnx2axxQYFgwpRNv_sPNZrMAfHdU7MX67HbyPyV3V21KAl8QNl0xE-lNl3myxnfyWH68Z5D-j501S7HWzkKxopy3SfGuwwZTjSVSVlnH4RmTwvEnW8W3tndcX6X1ETysYYXmO7TudIjtq7aUZqPBJe_MViePcWL3OV4q2b5; sess-at-main="TjcvTeXAA2dP6HOMGcG/n+Cdkr+peDBlNMOvfBz6oE0="; sst-main=Sst1|PQGR5AF9x4yS-iMft3B9aBzJC8v-e4M1kmB_3KS0pxtVTj1cH8hl3fajgigt6xEYhan-kUJuY5KNbteBgbiyDIRCs4ISve5MdRhDdoy7XKrVD1g5McZTyvdwYLfbTJbTUov51hOyPcE8BKpFL1bGpJiiJbZ0TV7Pyc6tkndogjneZATDErc4U08WE4LwPJxCiF-I-7Av4-JEfwH1ZQ81mz6rqy-K1o6bCMRRZ8kWuzrl0wobKsr4Sz0-m1K0waguIewhXNm4V4DLe8mn-_6I8_k9p9v3NiFRpp04v0Ptzw8V1ARo2U18t5f2nx54EXwHzvzOQlpeBVY2U0WpXDcKsU3C8Q; session-id-time=2082787201l; i18n-prefs=USD; x-wl-uid=1MwJyD7dRnGiVdHw1PKiwmoNP9S/0xy+3KAKCJl2fM5VOthLzEW3dzyeW4zdKAepcIxkXpJFkxWcafUXXcS0MeSyLyFoBkl3xnNPLiRK0Rq33AHw0gL3W1FDBUn9OcakOzJGVGKZRc5E=; s_vn=1614974634531%26vn%3D4; s_nr=1590823888871-Repeat; s_vnum=2022823888872%26vn%3D1; s_dslv=1590823888874; sp-cdn="L5Z9:FR"; session-token=3AIPjoIrP8ITt1e/KXLZGSlnOPpirrWotNpCpCEfNRCY9mCfAV169URMcAX8XECtxt/qJujUn66Oyz8KIFDMieNmSdzEKA0K8I4AqbzplslzVGtZ6rNg+XsX/Bdc3hxnB7tUqQhrbrtVUncdzUMN1c95vhL7p+AEog3iiDkhLch0VO+Sl8HkAdZ/63xrp0stAaUsYo1GgsOFGI8+3wJUp4CHrJnoj/0lqjCJCpgXTZfxJcfWy9KarcGAPkno+fuMQqMoShJdi8R+DZ9XmIMib1bsLwXnerZa; csm-hit=tb:GVY0F2K4G05TXW59KB9M+s-GVY0F2K4G05TXW59KB9M|1592424615451&t:1592424615452&adb:adblk_yes',
}
params = (
('k', 'Python'),
('ref', 'nb_sb_noss'),
)
response = requests.get('https://www.amazon.com/s', headers=headers, params=params)
soup = bs(response.text,'lxml')
print(soup.find('div',class_='s-main-slot s-result-list s-search-results sg-row'))
我试图用 python 的请求和 BeautifulSoup 库来抓取 Amazon.com,但我偶然发现了问题。我知道我可以使用 Selenium 并且我已经尝试过并且它有效但我仍然很好奇为什么会发生这种情况以及是否有解决方案。 这是我的代码:
# Searching python on Amazon
url = "https://www.amazon.com/s?k=Python"
# Deceiving Amazon that I am trying to reach them from a browser
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36'
}
page = requests.get(url, headers=headers)
soup = BeautifulSoup(page.content, "html.parser")
# Trying to get the element I need but prints "None"
print(soup.find("div", class_="s-main-slot s-result-list s-search-results sg-row"))
提前致谢。
使用 Selenium Python 的替代方法也解决了问题
使用selenium.webdriver
,您就有了适合自己的浏览器。例如,下面使用 Google-Chrome
webdriver.
然后您使用 driver.page_source
获得 html 结果页面。
from selenium.webdriver import Chrome
from selenium.webdriver import ChromeOptions
from bs4 import BeautifulSoup as Soup
options = ChromeOptions()
options.add_argument("headless") # to hide window in 'background'
driver = Chrome(options=options)
driver.get("https://www.amazon.com/s?k=Python")
html = driver.page_source
soup = Soup(html)
soup.find("div", class_="s-main-slot s-result-list s-search-results sg-row")
产出
<div class="s-main-slot s-result-list s-search-results sg-row">
<div class="sg-col-20-of-24 s-result-item s-asin sg-col-0-of-12 sg-col-28-of-32 sg-col-16-of-20 sg-col sg-col-32-of-36 sg-col-12-of-16 sg-col-24-of-28" data-asin="1593279280" data-component-id="6" data-component-type="s-search-result" data-index="0" data-uuid="c5f5837a-1f2e-4243-a520-a1936aac014e"><div class="sg-col-inner">
... etc.
Selenium python 安装 here
将解析器更改为 lxml
它应该可以工作。
url = "https://www.amazon.com/s?k=Python"
# Deceiving Amazon that I am trying to reach them from a browser
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36'
}
page = requests.get(url, headers=headers)
soup = BeautifulSoup(page.content, "lxml")
# Trying to get the element I need but prints "None"
print(soup.find("div", class_="s-main-slot s-result-list s-search-results sg-row"))
我的控制台输出:
<div class="s-main-slot s-result-list s-search-results sg-row">
<div class="sg-col-20-of-24 s-result-item s-asin sg-col-0-of-12 sg-col-28-of-32 sg-col-16-of-20 sg-col sg-col-32-of-36 sg-col-12-of-16 sg-col-24-of-28" data-asin="1593279280" data-component-type="s-search-result" data-index="0" data-uuid="ae6080d7-b07e-4558-b38f-613931584787"><div class="sg-col-inner">
<span cel_widget_id="MAIN-SEARCH_RESULTS" class="celwidget slot=MAIN template=SEARCH_RESULTS widgetId=search-results">
<div class="s-include-content-margin s-border-bottom s-latency-cf-section">
<div class="a-section a-spacing-medium">
<div class="sg-row">
<div class="a-section a-spacing-micro s-min-height-small">
<a class="a-link-normal" href="/gp/bestsellers/books/285856/ref=sr_bs_0_285856_1">
<span class="rush-component" data-component-props='{"badgeType":"best-seller","asin":"1593279280"}' data-component-type="s-status-badge-component">
<div class="a-row a-badge-region"><span aria-labelledby="1593279280-best-seller-label 1593279280-best-seller-supplementary" class="a-badge" data-a-badge-supplementary-position="right" data-a-badge-type="status" id="1593279280-best-seller" tabindex="0"><span aria-hidden="true" class="a-badge-label" data-a-badge-color="sx-orange" id="1593279280-best-seller-label"><span class="a-badge-label-inner a-text-ellipsis">
<span class="a-badge-text" data-a-badge-color="sx-cloud">Best Seller</span>
</span></span><span aria-hidden="true" class="a-badge-supplementary-text a-text-ellipsis" id="1593279280-best-seller-supplementary">in Python Programming</span></span></div>
</span>
</a>
</div>
</div>
<div class="sg-row">
<div class="sg-col-4-of-24 sg-col-4-of-12 sg-col-4-of-36 sg-col-4-of-28 sg-col-4-of-16 sg-col sg-col-4-of-20 sg-col-4-of-32"><div class="sg-col-inner">
<div class="a-section a-spacing-none">
<span class="rush-component" data-component-type="s-product-image">
<a class="a-link-normal" href="/Python-Crash-Course-2nd-Edition/dp/1593279280/ref=sr_1_1?dchild=1&keywords=Python&qid=1592423942&sr=8-1">
<div class="a-section aok-relative s-image-fixed-height">
<img alt="Python Crash Course, 2nd Edition: A Hands-On, Project-Based Introduction to Programming" class="s-image" data-image-index="0" data-image-latency="s-product-image" data-image-load="" data-image-source-density="1" src="https://m.media-amazon.com/images/I/81f8XACISAL._AC_UY218_.jpg" srcset="https://m.media-amazon.com/images/I/81f8XACISAL._AC_UY218_.jpg 1x, https://m.media-amazon.com/images/I/81f8XACISAL._AC_UY327_QL65_.jpg 1.5x, https://m.media-amazon.com/images/I/81f8XACISAL._AC_UY436_QL65_.jpg 2x, https://m.media-amazon.com/images/I/81f8XACISAL._AC_UY545_QL65_.jpg 2.5x, https://m.media-amazon.com/images/I/81f8XACISAL._AC_UY654_QL65_.jpg 3x"/>
</div>
</a>
</span>
</div>
</div></div>
<div class="sg-col-4-of-12 sg-col-8-of-16 sg-col-16-of-24 sg-col-12-of-20 sg-col-24-of-32 sg-col sg-col-28-of-36 sg-col-20-of-28"><div class="sg-col-inner">
<div class="sg-row">
<div class="sg-col-4-of-12 sg-col-8-of-16 sg-col-12-of-32 sg-col-12-of-20 sg-col-12-of-36 sg-col sg-col-12-of-24 sg-col-12-of-28"><div class="sg-col-inner">
<div class="a-section a-spacing-none">
<h2 class="a-size-mini a-spacing-none a-color-base s-line-clamp-2">
<a class="a-link-normal a-text-normal" href="/Python-Crash-Course-2nd-Edition/dp/1593279280/ref=sr_1_1?dchild=1&keywords=Python&qid=1592423942&sr=8-1">
<span class="a-size-medium a-color-base a-text-normal" dir="auto">Python Crash Course, 2nd Edition: A Hands-On, Project-Based Introduction to Programming</span>
</a>
</h2>
<div class="a-row a-size-base a-color-secondary"><span class="a-size-base" dir="auto">by </span>
<a class="a-size-base a-link-normal" href="/Eric-Matthes/e/B01DPU378I?ref=sr_ntt_srch_lnk_1&qid=1592423942&sr=8-1">
Eric Matthes
</a>
<span class="a-letter-space"></span><span class="a-size-base a-color-secondary" dir="auto"> | </span><span class="a-letter-space"></span><span class="a-size-base a-color-secondary a-text-normal" dir="auto">May 3, 2019</span></div>
</div>
<div class="a-section a-spacing-none a-spacing-top-micro">
<div class="a-row a-size-small">
<span aria-label="4.6 out of 5 stars">
<span class="a-declarative" data-a-popover='{"max-width":"700","closeButton":false,"position":"triggerBottom","url":"/review/widgets/average-customer-review/popover/ref=acr_search__popover?ie=UTF8&asin=1593279280&ref=acr_search__popover&contextId=search"}' data-action="a-popover">
<a class="a-popover-trigger a-declarative" href="javascript:void(0)"><i class="a-icon a-icon-star-small a-star-small-4-5 aok-align-bottom"><span class="a-icon-alt">4.6 out of 5 stars</span></i><i class="a-icon a-icon-popover"></i></a>
</span>
</span>
<span aria-label="555">
<a class="a-link-normal" href="/Python-Crash-Course-2nd-Edition/dp/1593279280/ref=sr_1_1?dchild=1&keywords=Python&qid=1592423942&sr=8-1#customerReviews">
<span class="a-size-base" dir="auto">555</span>
</a>
</span>
</div>
</div>
</div></div>
</div>
<div class="sg-row">
<div class="sg-col-4-of-24 sg-col-4-of-12 sg-col-4-of-36 sg-col-4-of-28 sg-col-4-of-16 sg-col sg-col-4-of-20 sg-col-4-of-32"><div class="sg-col-inner">
<div class="a-section a-spacing-none a-spacing-top-small">
<div class="a-row a-size-base a-color-base">
<a class="a-size-base a-link-normal a-text-bold" href="/Python-Crash-Course-2nd-Edition/dp/1593279280/ref=sr_1_1?dchild=1&keywords=Python&qid=1592423942&sr=8-1">
Paperback
</a>
</div><div class="a-row a-size-base a-color-base"><div class="a-row">
<a class="a-size-base a-link-normal a-text-normal" href="/Python-Crash-Course-2nd-Edition/dp/1593279280/ref=sr_1_1?dchild=1&keywords=Python&qid=1592423942&sr=8-1">
<span class="a-price" data-a-color="base" data-a-size="l"><span class="a-offscreen">.99</span><span aria-hidden="true"><span class="a-price-symbol">$</span><span class="a-price-whole">22<span class="a-price-decimal">.</span></span><span class="a-price-fraction">99</span></span></span>
<span class="a-price a-text-price" data-a-color="secondary" data-a-size="b" data-a-strike="true"><span class="a-offscreen">.95</span><span aria-hidden="true">.95</span></span>
</a>
</div></div><div class="a-row a-size-small a-color-secondary"><span dir="auto">Get 3 for the price of 2</span></div>
</div>
<div class="a-section a-spacing-none a-spacing-top-micro">
<div class="a-row a-size-base a-color-secondary s-align-children-center"><span class="a-size-small a-color-secondary" dir="auto">Ships to United Kingdom</span></div>
</div>
<div class="a-section a-spacing-none a-spacing-top-mini">
<div class="a-row a-size-base a-color-secondary"><span class="a-size-base a-color-secondary" dir="auto">More Buying Choices</span><br/><span class="a-color-base" dir="auto">.82</span><span class="a-letter-space"></span>
<a class="a-link-normal" href="/gp/offer-listing/1593279280/ref=sr_1_1?keywords=Python&qid=1592423942&sr=8-1&dchild=1">
(39 used & new offers)
</a>
</div>
</div>
<div class="a-section a-spacing-none a-spacing-top-mini">
<div class="a-row"><div class="a-row a-spacing-mini"><hr aria-hidden="true" class="a-spacing-mini a-divider-normal"/><div class="a-row a-size-base a-color-base">
<a class="a-size-base a-link-normal a-text-bold" href="/Python-Crash-Course-Eric-Matthes-ebook/dp/B07J4521M3/ref=sr_1_1?keywords=Python&qid=1592423942&sr=8-1">
Kindle
</a>
</div><div class="a-row a-size-base a-color-base"><div class="a-row">
<a class="a-size-base a-link-normal a-text-normal" href="/Python-Crash-Course-Eric-Matthes-ebook/dp/B07J4521M3/ref=sr_1_1?keywords=Python&qid=1592423942&sr=8-1">
<span class="a-price" data-a-color="base" data-a-size="l"><span class="a-offscreen">.99</span><span aria-hidden="true"><span class="a-price-symbol">$</span><span class="a-price-whole">23<span class="a-price-decimal">.</span></span><span class="a-price-fraction">99</span></span></span>
<span class="a-price a-text-price" data-a-color="secondary" data-a-size="b" data-a-strike="true"><span class="a-offscreen">.95</span><span aria-hidden="true">.95</span></span>
</a>
</div></div></div></div>
</div>
</div></div>
<div class="sg-col-4-of-12 sg-col-8-of-28 sg-col-4-of-16 sg-col-8-of-32 sg-col sg-col-8-of-20 sg-col-8-of-36 sg-col-8-of-24"><div class="sg-col-inner">
</div></div>
</div>
<div class="sg-row">
<div class="sg-col-20-of-24 sg-col-28-of-32 sg-col-16-of-20 sg-col sg-col-32-of-36 sg-col-8-of-12 sg-col-12-of-16 sg-col-24-of-28"><div class="sg-col-inner">
</div></div>
</div>
<div class="sg-row">
<div class="sg-col-20-of-24 sg-col-28-of-32 sg-col-16-of-20 sg-col sg-col-32-of-36 sg-col-8-of-12 sg-col-12-of-16 sg-col-24-of-28"><div class="sg-col-inner">
</div></div>
</div>
</div></div>
</div>
</div>
</div>
</span>
</div></div>
<div class="sg-col-20-of-24 s-result-item s-asin sg-col-0-of-12 sg-col-28-of-32 sg-col-16-of-20 sg-col sg-col-32-of-36 sg-col-12-of-16 sg-col-24-of-28" data-asin="1449355730" data-component-type="s-search-result" data-index="1" data-uuid="047b9c10-2a93-4895-97f7-83778651c3f6"><div class="sg-col-inner">
<span cel_widget_id="MAIN-SEARCH_RESULTS" class="celwidget slot=MAIN template=SEARCH_RESULTS widgetId=search-results">
<div class="s-include-content-margin s-border-bottom s-latency-cf-section">
<div class="a-section a-spacing-medium">
<div class="sg-row">
<div class="a-section a-spacing-micro s-min-height-small">
<a class="a-link-normal" href="/gp/bestsellers/books/132561011/ref=sr_bs_1_132561011_1">
<span class="rush-component" data-component-props='{"badgeType":"best-seller","asin":"1449355730"}' data-component-type="s-status-badge-component">
<div class="a-row a-badge-region"><span aria-labelledby="1449355730-best-seller-label 1449355730-best-seller-supplementary" class="a-badge" data-a-badge-supplementary-position="right" data-a-badge-type="status" id="1449355730-best-seller" tabindex="0"><span aria-hidden="true" class="a-badge-label" data-a-badge-color="sx-orange" id="1449355730-best-seller-label"><span class="a-badge-label-inner a-text-ellipsis">
<span class="a-badge-text" data-a-badge-color="sx-cloud">Best Seller</span>
</span></span><span aria-hidden="true" class="a-badge-supplementary-text a-text-ellipsis" id="1449355730-best-seller-supplementary">in Functional Software Programming</span></span></div>
</span>
</a>
</div>
</div>
使用 requests
和 BeautifulSoup
的正确解决方案是:
import requests
from bs4 import BeautifulSoup as bs
headers = {
'authority': 'www.amazon.com',
'cache-control': 'max-age=0',
'rtt': '300',
'downlink': '1.35',
'ect': '3g',
'sec-ch-ua': '"Google Chrome"; v="83"',
'sec-ch-ua-mobile': '?0',
'upgrade-insecure-requests': '1',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.106 Safari/537.36',
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'sec-fetch-site': 'none',
'sec-fetch-mode': 'navigate',
'sec-fetch-user': '?1',
'sec-fetch-dest': 'document',
'accept-language': 'en-GB,en-US;q=0.9,en;q=0.8',
'cookie': 'aws-priv=eyJ2IjoxLCJldSI6MCwic3QiOjB9; session-id=139-7350741-1081713; ubid-main=135-9894765-6184621; lc-main=en_US; s_fid=0A4730DDD06B62E4-1DB478AB62143F35; regStatus=pre-register; x-main=hd2N9IEBuVL7il1dbkhEEHTQSf4Q7uviwjc2eikr0hRGGOyI2RYIiRsk3GvDKLSx; at-main=Atza|IwEBIJdoAZ4Y6j2IIGvC29t1ha634aK-p2kAl8rHhQRCSGMSU_nwQvM6fakAbYEjpVLPU4Jj0TwKvX70d6QnlouKPh0QwpHJG8rHUNVb-gmhS9shHM8fCJk45r1XW2FOSpLoM1iAO9kYIpOoW2M5We9xfdqlLuQBB-D5fQeO5Vqew4RnHesPNZuF4DQNlcqL7wrGjDY1JQKzlzARfATAuwaCy4jMD5bNmxpcWtTgNGrTtLpGv1Y-4Mnx2axxQYFgwpRNv_sPNZrMAfHdU7MX67HbyPyV3V21KAl8QNl0xE-lNl3myxnfyWH68Z5D-j501S7HWzkKxopy3SfGuwwZTjSVSVlnH4RmTwvEnW8W3tndcX6X1ETysYYXmO7TudIjtq7aUZqPBJe_MViePcWL3OV4q2b5; sess-at-main="TjcvTeXAA2dP6HOMGcG/n+Cdkr+peDBlNMOvfBz6oE0="; sst-main=Sst1|PQGR5AF9x4yS-iMft3B9aBzJC8v-e4M1kmB_3KS0pxtVTj1cH8hl3fajgigt6xEYhan-kUJuY5KNbteBgbiyDIRCs4ISve5MdRhDdoy7XKrVD1g5McZTyvdwYLfbTJbTUov51hOyPcE8BKpFL1bGpJiiJbZ0TV7Pyc6tkndogjneZATDErc4U08WE4LwPJxCiF-I-7Av4-JEfwH1ZQ81mz6rqy-K1o6bCMRRZ8kWuzrl0wobKsr4Sz0-m1K0waguIewhXNm4V4DLe8mn-_6I8_k9p9v3NiFRpp04v0Ptzw8V1ARo2U18t5f2nx54EXwHzvzOQlpeBVY2U0WpXDcKsU3C8Q; session-id-time=2082787201l; i18n-prefs=USD; x-wl-uid=1MwJyD7dRnGiVdHw1PKiwmoNP9S/0xy+3KAKCJl2fM5VOthLzEW3dzyeW4zdKAepcIxkXpJFkxWcafUXXcS0MeSyLyFoBkl3xnNPLiRK0Rq33AHw0gL3W1FDBUn9OcakOzJGVGKZRc5E=; s_vn=1614974634531%26vn%3D4; s_nr=1590823888871-Repeat; s_vnum=2022823888872%26vn%3D1; s_dslv=1590823888874; sp-cdn="L5Z9:FR"; session-token=3AIPjoIrP8ITt1e/KXLZGSlnOPpirrWotNpCpCEfNRCY9mCfAV169URMcAX8XECtxt/qJujUn66Oyz8KIFDMieNmSdzEKA0K8I4AqbzplslzVGtZ6rNg+XsX/Bdc3hxnB7tUqQhrbrtVUncdzUMN1c95vhL7p+AEog3iiDkhLch0VO+Sl8HkAdZ/63xrp0stAaUsYo1GgsOFGI8+3wJUp4CHrJnoj/0lqjCJCpgXTZfxJcfWy9KarcGAPkno+fuMQqMoShJdi8R+DZ9XmIMib1bsLwXnerZa; csm-hit=tb:GVY0F2K4G05TXW59KB9M+s-GVY0F2K4G05TXW59KB9M|1592424615451&t:1592424615452&adb:adblk_yes',
}
params = (
('k', 'Python'),
('ref', 'nb_sb_noss'),
)
response = requests.get('https://www.amazon.com/s', headers=headers, params=params)
soup = bs(response.text,'lxml')
print(soup.find('div',class_='s-main-slot s-result-list s-search-results sg-row'))