如何创建通过 HtmlUnitDriver 和 HtmlUnit 无头浏览器以不同方式呈现的 Pick a Category (US) 下的项目列表?
How to create a list of the items under Pick a Category (US) rendered differently through HtmlUnitDriver and HtmlUnit headless browser?
如何在通过 HtmlUnitDriver 和 HtmlUnit 无头浏览器以不同方式呈现的 amzscout 中创建“选择类别(美国)”下的项目列表?
使用 GeckoDriver / Firefox 和 ChromeDriver / Chrome 组合,我能够创建列表并打印如下:
代码试用:
System.setProperty("webdriver.gecko.driver", "C:/Utility/BrowserDrivers/geckodriver.exe");
WebDriver driver = new FirefoxDriver();
driver.get("https://amzscout.net/sales-estimator");
List<WebElement> elements = new WebDriverWait(driver, 10).until(ExpectedConditions.visibilityOfAllElementsLocatedBy(By.cssSelector("span.cat-pick_name-in")));
for (WebElement ele:elements)
System.out.println(ele.getAttribute("innerHTML"));
driver.quit();
控制台输出:
Appliances
Arts, Crafts & Sewing
Automotive
.
.
.
但是,使用 HtmlUnitDriver 和 HtmlUnit 无头浏览器 似乎 HTML 呈现不同如下:
完整的html在pastebin
HTML的相关部分是:
<script type="application/ld+json">
//<![CDATA[
{
"@context": "http://schema.org/",
"@type": "Product",
"name": "AMZScout Sales Estimator",
"image": "",
"brand": "AMZScout",
"aggregateRating": {
"@type": "AggregateRating",
"ratingValue": "4.7",
"bestRating": "5",
"worstRating": "1",
"ratingCount": "231"
}
}
//]]>
</script>
<script type="text/javascript" src="/js/common.js">
</script>
<script type="text/javascript">
//<![CDATA[
const DATA = {
COM: [
["Appliances", "s-cat-icon-appliances"],
["Arts, Crafts & Sewing", "s-cat-icon-craft"],
["Automotive", "s-cat-icon-automotive"],
["Baby", "s-cat-icon-baby"],
["Beauty & Personal Care", "s-cat-icon-beauty"],
["Books", "s-cat-icon-books"],
["Camera & Photo", "s-cat-icon-camera"],
["Cell Phones & Accessories", "s-cat-icon-phone"],
["Clothing, Shoes & Jewelry", "s-cat-icon-clothing"],
["Computers & Accessories", "s-cat-icon-computers"],
["Electronics", "s-cat-icon-electronics"],
["Grocery & Gourmet Food", "s-cat-icon-food"],
["Health & Household", "s-cat-icon-health"],
["Home and Garden", "s-cat-icon-home"],
["Home & Kitchen", "s-cat-icon-kitchen"],
["Industrial & Scientific", "s-cat-icon-gear"],
["Jewelry", "s-cat-icon-jewelry"],
["Kindle Store", "s-cat-icon-kindle"],
["Kitchen & Dining", "s-cat-icon-dining"],
["Musical Instruments", "s-cat-icon-musical-instruments"],
["Office Products", "s-cat-icon-office"],
["Patio, Lawn & Garden", "s-cat-icon-lawn"],
["Pet Supplies", "s-cat-icon-pet-food"],
["Shoes", "s-cat-icon-shoes"],
["Software", "s-cat-icon-software"],
["Sports & Outdoors", "s-cat-icon-sports"],
["Tools & Home Improvement", "s-cat-icon-repairs"],
["Toys & Games", "s-cat-icon-toys"],
["Watches", "s-cat-icon-watches"],
["Video Games", "s-cat-icon-joystick"]
],
CO_UK: [
["Baby", "s-cat-icon-baby"],
其中引用了:
$(function () { var rankInput = $('.cat-rank_input'); function toggleRank(e) { var cats = $('.cat-pick'); var rank = $('.cat-rank'); var list = rank.find('.cat-pick_list'); var $el = $(e.currentTarget).clone(); $el.on('click', toggleRank).css('cursor',
'pointer'); list.empty(); list.append($el); category = $el.find('.cat-pick_name-in').text(); rankInput.val(''); cats.toggle(); rank.toggle(); if ($(window).width() >= 768) { var catsHeight = cats.height(); rank.height(catsHeight); } if (rank.is(':visible'))
{ val.text('?'); setTimeout(function () {rankInput.focus()}, 0); } } function selectDomain(d) { const data = DATA[d]; const list = $('.cat-pick .cat-pick_list'); list.empty(); data.filter(function (d) {return d[1] != ''}).forEach(function (d) { var el
= $('
<div class="cat-pick_i"><span class="cat-pick_link"><span class="cat-pick_ico"><span></span></span><span class="cat-pick_name"><span class="cat-pick_name-in"></span></span>
</span>
</div>'); el.find('.cat-pick_ico span').addClass(d[1]); el.find('.cat-pick_name-in').text(d[0]); el.on('click', toggleRank); list.append(el); }); domain = d; } rankInput.on('change', function () {rank = rankInput.val()}); rankInput.on('keyup', function(e) {e.keyCode
== 13 && (rank = rankInput.val()) && getEstSales()}); $('.cat-rank_another-link').on('click', toggleRank); $('#domain').on('change', function (e) {selectDomain(e.target.value);}); selectDomain(domain); });
谁能帮帮我?
如您所知,您要查找的项目是由 javascript 创建的。这意味着您必须启用对 HtmlUnit 的 javascript 支持。
第二点是以某种方式等待 javascript 完成。您正在使用 'visibilityOfAllElementsLocatedBy' 并且文档对此进行了说明:
An expectation for checking that all elements present on the web page that match the locator are visible.
如果没有元素(或不是所有元素,因为 javascript 仍在创建新元素)匹配您的选择器,则为真。因此,我稍微更改了等待条件以真正等待元素创建。
我的最终来源看起来像这样并创建了您期望的列表:
String url = "https://amzscout.net/sales-estimator";
// true enables javascript support
WebDriver driver = new HtmlUnitDriver(true);
try {
driver.get(url);
// wait until the elements are created
List<WebElement> elements =
new WebDriverWait(driver, 10)
.until(ExpectedConditions
.numberOfElementsToBeMoreThan(
By.cssSelector("span.cat-pick_name-in"), 29));
System.out.println();
for (WebElement ele : elements) {
System.out.println(ele.getAttribute("innerHTML"));
}
} finally {
driver.quit();
}
希望对您有所帮助....
如何在通过 HtmlUnitDriver 和 HtmlUnit 无头浏览器以不同方式呈现的 amzscout 中创建“选择类别(美国)”下的项目列表?
使用 GeckoDriver / Firefox 和 ChromeDriver / Chrome 组合,我能够创建列表并打印如下:
代码试用:
System.setProperty("webdriver.gecko.driver", "C:/Utility/BrowserDrivers/geckodriver.exe"); WebDriver driver = new FirefoxDriver(); driver.get("https://amzscout.net/sales-estimator"); List<WebElement> elements = new WebDriverWait(driver, 10).until(ExpectedConditions.visibilityOfAllElementsLocatedBy(By.cssSelector("span.cat-pick_name-in"))); for (WebElement ele:elements) System.out.println(ele.getAttribute("innerHTML")); driver.quit();
控制台输出:
Appliances Arts, Crafts & Sewing Automotive . . .
但是,使用 HtmlUnitDriver 和 HtmlUnit 无头浏览器 似乎 HTML 呈现不同如下:
完整的html在pastebin
HTML的相关部分是:
<script type="application/ld+json">
//<![CDATA[
{
"@context": "http://schema.org/",
"@type": "Product",
"name": "AMZScout Sales Estimator",
"image": "",
"brand": "AMZScout",
"aggregateRating": {
"@type": "AggregateRating",
"ratingValue": "4.7",
"bestRating": "5",
"worstRating": "1",
"ratingCount": "231"
}
}
//]]>
</script>
<script type="text/javascript" src="/js/common.js">
</script>
<script type="text/javascript">
//<![CDATA[
const DATA = {
COM: [
["Appliances", "s-cat-icon-appliances"],
["Arts, Crafts & Sewing", "s-cat-icon-craft"],
["Automotive", "s-cat-icon-automotive"],
["Baby", "s-cat-icon-baby"],
["Beauty & Personal Care", "s-cat-icon-beauty"],
["Books", "s-cat-icon-books"],
["Camera & Photo", "s-cat-icon-camera"],
["Cell Phones & Accessories", "s-cat-icon-phone"],
["Clothing, Shoes & Jewelry", "s-cat-icon-clothing"],
["Computers & Accessories", "s-cat-icon-computers"],
["Electronics", "s-cat-icon-electronics"],
["Grocery & Gourmet Food", "s-cat-icon-food"],
["Health & Household", "s-cat-icon-health"],
["Home and Garden", "s-cat-icon-home"],
["Home & Kitchen", "s-cat-icon-kitchen"],
["Industrial & Scientific", "s-cat-icon-gear"],
["Jewelry", "s-cat-icon-jewelry"],
["Kindle Store", "s-cat-icon-kindle"],
["Kitchen & Dining", "s-cat-icon-dining"],
["Musical Instruments", "s-cat-icon-musical-instruments"],
["Office Products", "s-cat-icon-office"],
["Patio, Lawn & Garden", "s-cat-icon-lawn"],
["Pet Supplies", "s-cat-icon-pet-food"],
["Shoes", "s-cat-icon-shoes"],
["Software", "s-cat-icon-software"],
["Sports & Outdoors", "s-cat-icon-sports"],
["Tools & Home Improvement", "s-cat-icon-repairs"],
["Toys & Games", "s-cat-icon-toys"],
["Watches", "s-cat-icon-watches"],
["Video Games", "s-cat-icon-joystick"]
],
CO_UK: [
["Baby", "s-cat-icon-baby"],
其中引用了:
$(function () { var rankInput = $('.cat-rank_input'); function toggleRank(e) { var cats = $('.cat-pick'); var rank = $('.cat-rank'); var list = rank.find('.cat-pick_list'); var $el = $(e.currentTarget).clone(); $el.on('click', toggleRank).css('cursor',
'pointer'); list.empty(); list.append($el); category = $el.find('.cat-pick_name-in').text(); rankInput.val(''); cats.toggle(); rank.toggle(); if ($(window).width() >= 768) { var catsHeight = cats.height(); rank.height(catsHeight); } if (rank.is(':visible'))
{ val.text('?'); setTimeout(function () {rankInput.focus()}, 0); } } function selectDomain(d) { const data = DATA[d]; const list = $('.cat-pick .cat-pick_list'); list.empty(); data.filter(function (d) {return d[1] != ''}).forEach(function (d) { var el
= $('
<div class="cat-pick_i"><span class="cat-pick_link"><span class="cat-pick_ico"><span></span></span><span class="cat-pick_name"><span class="cat-pick_name-in"></span></span>
</span>
</div>'); el.find('.cat-pick_ico span').addClass(d[1]); el.find('.cat-pick_name-in').text(d[0]); el.on('click', toggleRank); list.append(el); }); domain = d; } rankInput.on('change', function () {rank = rankInput.val()}); rankInput.on('keyup', function(e) {e.keyCode
== 13 && (rank = rankInput.val()) && getEstSales()}); $('.cat-rank_another-link').on('click', toggleRank); $('#domain').on('change', function (e) {selectDomain(e.target.value);}); selectDomain(domain); });
谁能帮帮我?
如您所知,您要查找的项目是由 javascript 创建的。这意味着您必须启用对 HtmlUnit 的 javascript 支持。
第二点是以某种方式等待 javascript 完成。您正在使用 'visibilityOfAllElementsLocatedBy' 并且文档对此进行了说明:
An expectation for checking that all elements present on the web page that match the locator are visible.
如果没有元素(或不是所有元素,因为 javascript 仍在创建新元素)匹配您的选择器,则为真。因此,我稍微更改了等待条件以真正等待元素创建。
我的最终来源看起来像这样并创建了您期望的列表:
String url = "https://amzscout.net/sales-estimator";
// true enables javascript support
WebDriver driver = new HtmlUnitDriver(true);
try {
driver.get(url);
// wait until the elements are created
List<WebElement> elements =
new WebDriverWait(driver, 10)
.until(ExpectedConditions
.numberOfElementsToBeMoreThan(
By.cssSelector("span.cat-pick_name-in"), 29));
System.out.println();
for (WebElement ele : elements) {
System.out.println(ele.getAttribute("innerHTML"));
}
} finally {
driver.quit();
}
希望对您有所帮助....