硒 C# drive.PageSource - 'is too long, or a component of the specified path is too long.'
Selenium C# drive.PageSource - 'is too long, or a component of the specified path is too long.'
我试图将 driver.PageSource 从 Selenium C# 传递到 HTML Agility Pack,但是这行代码 htmlDoc.Load(driver.PageSource);
returns 错误:'...'太长,或者指定路径的某个组件太长。
p.s。当我试图在 Python 而不是 C# 中做同样的事情时,Selenium Python 和 Beautiful Soup 不会产生这个错误。
如何解决这个问题?
完整代码:
using System;
using System.Threading;
using HtmlAgilityPack;
using OpenQA.Selenium;
using OpenQA.Selenium.Chrome;
using OpenQA.Selenium.Support.UI;
namespace SeleniumSharp
{
public static class WebScraping
{
public static void GetPageData()
{
// initial setup
IWebDriver driver = new ChromeDriver();
driver.Navigate().GoToUrl("<url>");
// dropdown
var dropdown1 = driver.FindElement(By.Id("cpMain_ucc1_ctl00_liResidentialFront"));
dropdown1.Click();
// enter search query
var search = driver.FindElement(By.Id("cpMain_ucc1_ctl00_txtResidentialSearchBox"));
search.Click();
search.SendKeys("london");
Thread.Sleep(3000);
// submit search
var submit = driver.FindElement(By.XPath("//div[@id='cpMain_ucc1_ctl00_pnlContentResidential']//a[@class='search-button']"));
submit.Click();
// Html Agility Pack
HtmlDocument htmlDoc = new HtmlDocument();
htmlDoc.Load(driver.PageSource);
var address = htmlDoc.DocumentNode
.SelectNodes("//div[@class='grid-address']")
.ToList();
foreach(var item in address)
{
Console.WriteLine(item.InnerText);
}
}
}
}
这行代码returns错误:
htmlDoc.Load(driver.PageSource);
错误:
'<html source>'is too long, or a component of the specified path is too long.
at System.IO.PathHelper.GetFullPathName(ReadOnlySpan`1 path, ValueStringBuilder& builder)
at System.IO.PathHelper.Normalize(String path)
at System.IO.Path.GetFullPath(String path)
at System.IO.FileStream..ctor(String path, FileMode mode, FileAccess access, FileShare share, Int32 bufferSize, FileOptions options)
at System.IO.StreamReader.ValidateArgsAndOpenPath(String path, Encoding encoding, Int32 bufferSize)
at System.IO.StreamReader..ctor(String path, Encoding encoding)
at HtmlAgilityPack.HtmlDocument.Load(String path)
这是因为您使用的方法是Load
而不是LoadHtml
。 Load 方法使用包含 HTML 的文件路径,而不是 HTML 源 (driver.PageSource).
// From File
var doc = new HtmlDocument();
doc.Load(filePath);
// From String
var doc = new HtmlDocument();
doc.LoadHtml(html);
所以尝试使用
htmlDoc.LoadHtml(driver.PageSource);
我试图将 driver.PageSource 从 Selenium C# 传递到 HTML Agility Pack,但是这行代码 htmlDoc.Load(driver.PageSource);
returns 错误:'...'太长,或者指定路径的某个组件太长。
p.s。当我试图在 Python 而不是 C# 中做同样的事情时,Selenium Python 和 Beautiful Soup 不会产生这个错误。
如何解决这个问题?
完整代码:
using System;
using System.Threading;
using HtmlAgilityPack;
using OpenQA.Selenium;
using OpenQA.Selenium.Chrome;
using OpenQA.Selenium.Support.UI;
namespace SeleniumSharp
{
public static class WebScraping
{
public static void GetPageData()
{
// initial setup
IWebDriver driver = new ChromeDriver();
driver.Navigate().GoToUrl("<url>");
// dropdown
var dropdown1 = driver.FindElement(By.Id("cpMain_ucc1_ctl00_liResidentialFront"));
dropdown1.Click();
// enter search query
var search = driver.FindElement(By.Id("cpMain_ucc1_ctl00_txtResidentialSearchBox"));
search.Click();
search.SendKeys("london");
Thread.Sleep(3000);
// submit search
var submit = driver.FindElement(By.XPath("//div[@id='cpMain_ucc1_ctl00_pnlContentResidential']//a[@class='search-button']"));
submit.Click();
// Html Agility Pack
HtmlDocument htmlDoc = new HtmlDocument();
htmlDoc.Load(driver.PageSource);
var address = htmlDoc.DocumentNode
.SelectNodes("//div[@class='grid-address']")
.ToList();
foreach(var item in address)
{
Console.WriteLine(item.InnerText);
}
}
}
}
这行代码returns错误:
htmlDoc.Load(driver.PageSource);
错误:
'<html source>'is too long, or a component of the specified path is too long.
at System.IO.PathHelper.GetFullPathName(ReadOnlySpan`1 path, ValueStringBuilder& builder)
at System.IO.PathHelper.Normalize(String path)
at System.IO.Path.GetFullPath(String path)
at System.IO.FileStream..ctor(String path, FileMode mode, FileAccess access, FileShare share, Int32 bufferSize, FileOptions options)
at System.IO.StreamReader.ValidateArgsAndOpenPath(String path, Encoding encoding, Int32 bufferSize)
at System.IO.StreamReader..ctor(String path, Encoding encoding)
at HtmlAgilityPack.HtmlDocument.Load(String path)
这是因为您使用的方法是Load
而不是LoadHtml
。 Load 方法使用包含 HTML 的文件路径,而不是 HTML 源 (driver.PageSource).
// From File
var doc = new HtmlDocument();
doc.Load(filePath);
// From String
var doc = new HtmlDocument();
doc.LoadHtml(html);
所以尝试使用
htmlDoc.LoadHtml(driver.PageSource);