如何通过PhantomJS Driver从网站的HTML中提取数据
How to extract data from the HTML of the website through PhantomJS Driver
我正在尝试使用 .Net、Selenium、PhantomJs 解析以下网页 https://shop.sprouts.com/shop/flyer。我在元素文本中看到的数据与我在屏幕上看到的完全不同。有没有更好的方法来解析网页?
using Microsoft.VisualStudio.TestTools.UnitTesting;
using OpenQA.Selenium;
using OpenQA.Selenium.PhantomJS;
[TestClass]
public class UnitTest1
{
const string PhantomDirectory = @"..\..\..\packages\PhantomJS.2.1.1\tools\phantomjs";
[TestMethod]
public void GetSproutsWeeklyAdDetails()
{
using (IWebDriver phantomDriver = new PhantomJSDriver(PhantomDirectory))
{
phantomDriver.Navigate().GoToUrl("https://shop.sprouts.com/shop/flyer");
var elements = phantomDriver.FindElements(By.ClassName("cell-title-text"));
}
}
}
根据 WebSite https://shop.sprouts.com/shop/flyer
解析您在元素文本中看到的数据,您需要引入 WebDriverWait 以获得所有所需元素的可见性,您可以使用以下解决方案:
解法:
IList<IWebElement> elements = new WebDriverWait(driver, TimeSpan.FromSeconds(3)).Until(ExpectedConditions.VisibilityOfAllElementsLocatedBy(By.XPath("//span[@class='cell-title-text' and @ng-bind-html='productTitle()']")));
foreach (IWebElement element in elements)
{
Console.WriteLine(element.GetAttribute("innerHTML"));
}
等价物Python 例子:
driver.get('https://shop.sprouts.com/shop/flyer')
myList = WebDriverWait(driver, 10).until(EC.visibility_of_all_elements_located((By.XPATH, "//span[@class='cell-title-text' and @ng-bind-html='productTitle()']")))
for item in myList:
print(item.text)
控制台输出:
Sweet Corn, 1 EA
Cantaloupe Melons, 1 LB
Red Cherries
Half Chicken Breast
Roma Tomatoes
100% Grass Fed Ground Beef Value Pack
Colby Jack Rbst Free
Walnut Halves & Pieces
我正在尝试使用 .Net、Selenium、PhantomJs 解析以下网页 https://shop.sprouts.com/shop/flyer。我在元素文本中看到的数据与我在屏幕上看到的完全不同。有没有更好的方法来解析网页?
using Microsoft.VisualStudio.TestTools.UnitTesting;
using OpenQA.Selenium;
using OpenQA.Selenium.PhantomJS;
[TestClass]
public class UnitTest1
{
const string PhantomDirectory = @"..\..\..\packages\PhantomJS.2.1.1\tools\phantomjs";
[TestMethod]
public void GetSproutsWeeklyAdDetails()
{
using (IWebDriver phantomDriver = new PhantomJSDriver(PhantomDirectory))
{
phantomDriver.Navigate().GoToUrl("https://shop.sprouts.com/shop/flyer");
var elements = phantomDriver.FindElements(By.ClassName("cell-title-text"));
}
}
}
根据 WebSite https://shop.sprouts.com/shop/flyer
解析您在元素文本中看到的数据,您需要引入 WebDriverWait 以获得所有所需元素的可见性,您可以使用以下解决方案:
解法:
IList<IWebElement> elements = new WebDriverWait(driver, TimeSpan.FromSeconds(3)).Until(ExpectedConditions.VisibilityOfAllElementsLocatedBy(By.XPath("//span[@class='cell-title-text' and @ng-bind-html='productTitle()']"))); foreach (IWebElement element in elements) { Console.WriteLine(element.GetAttribute("innerHTML")); }
等价物Python 例子:
driver.get('https://shop.sprouts.com/shop/flyer') myList = WebDriverWait(driver, 10).until(EC.visibility_of_all_elements_located((By.XPATH, "//span[@class='cell-title-text' and @ng-bind-html='productTitle()']"))) for item in myList: print(item.text)
控制台输出:
Sweet Corn, 1 EA Cantaloupe Melons, 1 LB Red Cherries Half Chicken Breast Roma Tomatoes 100% Grass Fed Ground Beef Value Pack Colby Jack Rbst Free Walnut Halves & Pieces