如何提取亚马逊评论?
How to extract Amazon Reviews?
我想提取亚马逊评论及其所有相关数据,例如:评论者姓名、评级、内容和(如果可能)对该评论的评论。
我正在使用 python 3.7.
您可以通过两种方法实现:
- API(靠谱快)
向亚马逊索取 API
- 像 Headless chrome 或 Selenium 这样的工具
check this post
在页面中找到 DOM 个元素 like
<div id="R1IZDPP09RA69A" data-hook="review" class="a-section review">
<div id="customer_review-R1IZDPP09RA69A" class="a-section celwidget" data-cel-widget="customer_review-R1IZDPP09RA69A">
<div class="a-row a-spacing-mini">
<a href="/gp/profile/amzn1.account.AGFB356ZQJQAHWZZABTTNIFGYMDA/ref=cm_cr_dp_d_gw_tr?ie=UTF8" class="a-profile" data-a-size="small">
<div aria-hidden="true" class="a-profile-avatar-wrapper">
<div class="a-profile-avatar">
<img src="https://images-eu.ssl-images-amazon.com/images/S/amazon-avatars/f0d86e6d-45d4-4cc7-a4b8-a062450c2c75._CR0,0,335,335_SX48_.jpg" class="" data-src="https://images-eu.ssl-images-amazon.com/images/S/amazon-avatars/f0d86e6d-45d4-4cc7-a4b8-a062450c2c75._CR0,0,335,335_SX48_.jpg">
<noscript><img src="https://images-eu.ssl-images-amazon.com/images/S/amazon-avatars/f0d86e6d-45d4-4cc7-a4b8-a062450c2c75._CR0,0,335,335_SX48_.jpg"></noscript>
</div>
</div>
<div class="a-profile-content"><span class="a-profile-name">Sana Khanam</span></div>
</a>
</div>
<div class="a-row"><a class="a-link-normal" title="4.0 out of 5 stars" href="/gp/customer-reviews/R1IZDPP09RA69A/ref=cm_cr_dp_d_rvw_ttl?ie=UTF8&ASIN=B079Q9VNWQ"><i data-hook="review-star-rating" class="a-icon a-icon-star a-star-4 review-rating"><span class="a-icon-alt">4.0 out of 5 stars</span></i></a><span class="a-letter-space"></span><a data-hook="review-title" class="a-size-base a-link-normal review-title a-color-base a-text-bold" href="/gp/customer-reviews/R1IZDPP09RA69A/ref=cm_cr_dp_d_rvw_ttl?ie=UTF8&ASIN=B079Q9VNWQ">Amazing service by VOLTAS !!Appreciated </a></div>
<span data-hook="review-date" class="a-size-base a-color-secondary review-date">17 May 2018</span>
<div class="a-row a-spacing-mini review-data review-format-strip"><span data-hook="avp-badge-linkless" class="a-size-mini a-color-state a-text-bold">Verified Purchase</span></div>
<div class="a-row a-spacing-small review-data">
<span data-hook="review-body" class="a-size-base review-text">
<div aria-live="polite" data-a-expander-name="review_text_read_more" data-a-expander-collapsed-height="300" class="a-expander-collapsed-height a-row a-expander-container a-expander-partial-collapse-container" style="max-height: none; height: 300px;">
<div data-hook="review-collapsed" aria-expanded="false" class="a-expander-content a-expander-partial-collapse-content" style="padding-bottom: 19px;">5 th day after having AC installed.<br>PROS:<br><br>-Cooling nice<br>-Looks good<br>-Good one in this price range<br>-Fast Installation within 24h<br>- Customer support response appriciated<br><br>CONS<br>-Started making weird little noises while decreasing temperature.<br><br>-I don't understand but some unpleasant smell being diffused after starting it at the 5th day of installation.<br><br>-Contacted seller, issue resolved!<br><br>Overall would RECOMMEND!! Go for it.</div>
<div class="a-expander-header a-expander-partial-collapse-header" style="opacity: 1; display: block;">
<div class="a-expander-content-fade"></div>
<a href="javascript:void(0)" data-hook="expand-collapse-read-more-less" data-action="a-expander-toggle" class="a-declarative" data-a-expander-toggle="{"allowLinkDefault":true, "expand_prompt":"Read more", "collapse_prompt":"Read less"}"><i class="a-icon a-icon-extender-expand"></i><span class="a-expander-prompt">Read more</span></a>
</div>
</div>
</span>
</div>
<div data-hook="review-comments" class="a-row review-comments">
<span class="cr-vote" data-hook="review-voting-widget">
<div class="a-row a-spacing-small"><span data-hook="helpful-vote-statement" class="a-size-base a-color-tertiary cr-vote-text">5 people found this helpful</span></div>
<div class="cr-helpful-button aok-float-left">
<span class="a-button a-button-base" id="a-autoid-14">
<span class="a-button-inner">
<a href="https://www.amazon.in/ap/signin?openid.return_to=https%3A%2F%2Fwww.amazon.in%2Fdp%2FB079Q9VNWQ%2Fref%3Dcm_cr_dp_d_vote_lft%3Fie%3DUTF8%26voteInstanceId%3DR1IZDPP09RA69A%26voteValue%3D1%26csrfT%3DgiYnW3e%252Fv7p8y07m5Je2hA3LGXQ2gVKHQWzD%252F40AAAAJAAAAAFuvcotyYXcAAAAAFVfwLBerPie4v1Ep%252F%252F%252F%252F%23R1IZDPP09RA69A&openid.identity=http%3A%2F%2Fspecs.openid.net%2Fauth%2F2.0%2Fidentifier_select&openid.claimed_id=http%3A%2F%2Fspecs.openid.net%2Fauth%2F2.0%2Fidentifier_select&openid.assoc_handle=inflex&openid.mode=checkid_setup&openid.ns=http%3A%2F%2Fspecs.openid.net%2Fauth%2F2.0" data-hook="vote-helpful-button" class="a-button-text" role="button" id="a-autoid-14-announce">
<div class="cr-helpful-text">
Helpful
</div>
</a>
</span>
</span>
</div>
</span>
<i class="a-icon a-icon-text-separator" role="img" aria-label="|"></i><a data-hook="review-comment" class="a-size-base a-link-normal a-color-secondary a-text-normal" href="/gp/customer-reviews/R1IZDPP09RA69A/ref=cm_cr_dp_d_rvw_btm?ie=UTF8&ASIN=B079Q9VNWQ#wasThisHelpful">Comment</a><span class="cr-footer-line-height">
<span><i class="a-icon a-icon-text-separator" role="img" aria-label="|"></i><span class="a-declarative" data-action="cr-popup" data-cr-popup="{"width":"580","title":"ReportAbuse","url":"/hz/reviews-render/report-abuse?ie=UTF8&voteDomain=Reviews&ref=cm_cr_dp_d_rvw_hlp&csrfT=giYnW3e%2Fv7p8y07m5Je2hA3LGXQ2gVKHQWzD%2F40AAAAJAAAAAFuvcotyYXcAAAAAFVfwLBerPie4v1Ep%2F%2F%2F%2F&entityId=R1IZDPP09RA69A&sessionId=257-1905223-9805712","height":"380"}"><a class="a-size-base a-link-normal a-color-secondary report-abuse-link a-text-normal" href="/hz/reviews-render/report-abuse?ie=UTF8&voteDomain=Reviews&ref=cm_cr_dp_d_rvw_hlp&csrfT=giYnW3e%2Fv7p8y07m5Je2hA3LGXQ2gVKHQWzD%2F40AAAAJAAAAAFuvcotyYXcAAAAAFVfwLBerPie4v1Ep%2F%2F%2F%2F&entityId=R1IZDPP09RA69A&sessionId=257-1905223-9805712">Report abuse</a></span></span></span>
</div>
</div>
</div>
在这里,您可以在每个 div
下找到具有 data-hook="review"
属性的评论者姓名评分和评论。
<span class="a-profile-name">Sana Khanam</span>
下的名字
评分<span class="a-icon-alt">4.0 out of 5 stars</span>
评论<div data-hook="review-collapsed" aria-expanded="false" class="a-expander-content a-expander-partial-collapse-content" style="padding-bottom: 19px;">5 th day after...</div>
我想提取亚马逊评论及其所有相关数据,例如:评论者姓名、评级、内容和(如果可能)对该评论的评论。 我正在使用 python 3.7.
您可以通过两种方法实现:
- API(靠谱快)
向亚马逊索取 API
- 像 Headless chrome 或 Selenium 这样的工具 check this post
在页面中找到 DOM 个元素 like
<div id="R1IZDPP09RA69A" data-hook="review" class="a-section review">
<div id="customer_review-R1IZDPP09RA69A" class="a-section celwidget" data-cel-widget="customer_review-R1IZDPP09RA69A">
<div class="a-row a-spacing-mini">
<a href="/gp/profile/amzn1.account.AGFB356ZQJQAHWZZABTTNIFGYMDA/ref=cm_cr_dp_d_gw_tr?ie=UTF8" class="a-profile" data-a-size="small">
<div aria-hidden="true" class="a-profile-avatar-wrapper">
<div class="a-profile-avatar">
<img src="https://images-eu.ssl-images-amazon.com/images/S/amazon-avatars/f0d86e6d-45d4-4cc7-a4b8-a062450c2c75._CR0,0,335,335_SX48_.jpg" class="" data-src="https://images-eu.ssl-images-amazon.com/images/S/amazon-avatars/f0d86e6d-45d4-4cc7-a4b8-a062450c2c75._CR0,0,335,335_SX48_.jpg">
<noscript><img src="https://images-eu.ssl-images-amazon.com/images/S/amazon-avatars/f0d86e6d-45d4-4cc7-a4b8-a062450c2c75._CR0,0,335,335_SX48_.jpg"></noscript>
</div>
</div>
<div class="a-profile-content"><span class="a-profile-name">Sana Khanam</span></div>
</a>
</div>
<div class="a-row"><a class="a-link-normal" title="4.0 out of 5 stars" href="/gp/customer-reviews/R1IZDPP09RA69A/ref=cm_cr_dp_d_rvw_ttl?ie=UTF8&ASIN=B079Q9VNWQ"><i data-hook="review-star-rating" class="a-icon a-icon-star a-star-4 review-rating"><span class="a-icon-alt">4.0 out of 5 stars</span></i></a><span class="a-letter-space"></span><a data-hook="review-title" class="a-size-base a-link-normal review-title a-color-base a-text-bold" href="/gp/customer-reviews/R1IZDPP09RA69A/ref=cm_cr_dp_d_rvw_ttl?ie=UTF8&ASIN=B079Q9VNWQ">Amazing service by VOLTAS !!Appreciated </a></div>
<span data-hook="review-date" class="a-size-base a-color-secondary review-date">17 May 2018</span>
<div class="a-row a-spacing-mini review-data review-format-strip"><span data-hook="avp-badge-linkless" class="a-size-mini a-color-state a-text-bold">Verified Purchase</span></div>
<div class="a-row a-spacing-small review-data">
<span data-hook="review-body" class="a-size-base review-text">
<div aria-live="polite" data-a-expander-name="review_text_read_more" data-a-expander-collapsed-height="300" class="a-expander-collapsed-height a-row a-expander-container a-expander-partial-collapse-container" style="max-height: none; height: 300px;">
<div data-hook="review-collapsed" aria-expanded="false" class="a-expander-content a-expander-partial-collapse-content" style="padding-bottom: 19px;">5 th day after having AC installed.<br>PROS:<br><br>-Cooling nice<br>-Looks good<br>-Good one in this price range<br>-Fast Installation within 24h<br>- Customer support response appriciated<br><br>CONS<br>-Started making weird little noises while decreasing temperature.<br><br>-I don't understand but some unpleasant smell being diffused after starting it at the 5th day of installation.<br><br>-Contacted seller, issue resolved!<br><br>Overall would RECOMMEND!! Go for it.</div>
<div class="a-expander-header a-expander-partial-collapse-header" style="opacity: 1; display: block;">
<div class="a-expander-content-fade"></div>
<a href="javascript:void(0)" data-hook="expand-collapse-read-more-less" data-action="a-expander-toggle" class="a-declarative" data-a-expander-toggle="{"allowLinkDefault":true, "expand_prompt":"Read more", "collapse_prompt":"Read less"}"><i class="a-icon a-icon-extender-expand"></i><span class="a-expander-prompt">Read more</span></a>
</div>
</div>
</span>
</div>
<div data-hook="review-comments" class="a-row review-comments">
<span class="cr-vote" data-hook="review-voting-widget">
<div class="a-row a-spacing-small"><span data-hook="helpful-vote-statement" class="a-size-base a-color-tertiary cr-vote-text">5 people found this helpful</span></div>
<div class="cr-helpful-button aok-float-left">
<span class="a-button a-button-base" id="a-autoid-14">
<span class="a-button-inner">
<a href="https://www.amazon.in/ap/signin?openid.return_to=https%3A%2F%2Fwww.amazon.in%2Fdp%2FB079Q9VNWQ%2Fref%3Dcm_cr_dp_d_vote_lft%3Fie%3DUTF8%26voteInstanceId%3DR1IZDPP09RA69A%26voteValue%3D1%26csrfT%3DgiYnW3e%252Fv7p8y07m5Je2hA3LGXQ2gVKHQWzD%252F40AAAAJAAAAAFuvcotyYXcAAAAAFVfwLBerPie4v1Ep%252F%252F%252F%252F%23R1IZDPP09RA69A&openid.identity=http%3A%2F%2Fspecs.openid.net%2Fauth%2F2.0%2Fidentifier_select&openid.claimed_id=http%3A%2F%2Fspecs.openid.net%2Fauth%2F2.0%2Fidentifier_select&openid.assoc_handle=inflex&openid.mode=checkid_setup&openid.ns=http%3A%2F%2Fspecs.openid.net%2Fauth%2F2.0" data-hook="vote-helpful-button" class="a-button-text" role="button" id="a-autoid-14-announce">
<div class="cr-helpful-text">
Helpful
</div>
</a>
</span>
</span>
</div>
</span>
<i class="a-icon a-icon-text-separator" role="img" aria-label="|"></i><a data-hook="review-comment" class="a-size-base a-link-normal a-color-secondary a-text-normal" href="/gp/customer-reviews/R1IZDPP09RA69A/ref=cm_cr_dp_d_rvw_btm?ie=UTF8&ASIN=B079Q9VNWQ#wasThisHelpful">Comment</a><span class="cr-footer-line-height">
<span><i class="a-icon a-icon-text-separator" role="img" aria-label="|"></i><span class="a-declarative" data-action="cr-popup" data-cr-popup="{"width":"580","title":"ReportAbuse","url":"/hz/reviews-render/report-abuse?ie=UTF8&voteDomain=Reviews&ref=cm_cr_dp_d_rvw_hlp&csrfT=giYnW3e%2Fv7p8y07m5Je2hA3LGXQ2gVKHQWzD%2F40AAAAJAAAAAFuvcotyYXcAAAAAFVfwLBerPie4v1Ep%2F%2F%2F%2F&entityId=R1IZDPP09RA69A&sessionId=257-1905223-9805712","height":"380"}"><a class="a-size-base a-link-normal a-color-secondary report-abuse-link a-text-normal" href="/hz/reviews-render/report-abuse?ie=UTF8&voteDomain=Reviews&ref=cm_cr_dp_d_rvw_hlp&csrfT=giYnW3e%2Fv7p8y07m5Je2hA3LGXQ2gVKHQWzD%2F40AAAAJAAAAAFuvcotyYXcAAAAAFVfwLBerPie4v1Ep%2F%2F%2F%2F&entityId=R1IZDPP09RA69A&sessionId=257-1905223-9805712">Report abuse</a></span></span></span>
</div>
</div>
</div>
在这里,您可以在每个 div
下找到具有 data-hook="review"
属性的评论者姓名评分和评论。
<span class="a-profile-name">Sana Khanam</span>
下的名字
评分<span class="a-icon-alt">4.0 out of 5 stars</span>
评论<div data-hook="review-collapsed" aria-expanded="false" class="a-expander-content a-expander-partial-collapse-content" style="padding-bottom: 19px;">5 th day after...</div>