试图从源代码中抓取所有 href。我不明白我做错了什么
Trying to scrape all href from source code. I dont understand what I am doing wrong
我正在尝试从标签中的源代码中抓取所有 href 并使 class = "linked formlink" 。我不明白我在做什么 wrong.I 在 "links".
中得到 null
StreamReader sr = new StreamReader(webBrowser1.DocumentStream);
string sourceCode = sr.ReadToEnd();
sr.Close();
//removing illegal path
string regexSearch = new string(Path.GetInvalidFileNameChars()) + new string(Path.GetInvalidPathChars());
Regex r = new Regex(string.Format("[{0}]", Regex.Escape(regexSearch)));
sourceCode = r.Replace(sourceCode, "");
HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();
htmlDoc.LoadHtml(sourceCode);
var links = htmlDoc.DocumentNode
.Descendants("a")
.Where(x => x.Attributes["class"] != null
&& x.Attributes["class"].Value == "linked formlink")
.Select(x => x.Attributes["href"].Value.ToString());
正则表达式正在删除括号以及 html-agile-pack 用来确定标签和 类
的其他必要字符
只需删除它
我正在尝试从标签中的源代码中抓取所有 href 并使 class = "linked formlink" 。我不明白我在做什么 wrong.I 在 "links".
中得到 nullStreamReader sr = new StreamReader(webBrowser1.DocumentStream);
string sourceCode = sr.ReadToEnd();
sr.Close();
//removing illegal path
string regexSearch = new string(Path.GetInvalidFileNameChars()) + new string(Path.GetInvalidPathChars());
Regex r = new Regex(string.Format("[{0}]", Regex.Escape(regexSearch)));
sourceCode = r.Replace(sourceCode, "");
HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();
htmlDoc.LoadHtml(sourceCode);
var links = htmlDoc.DocumentNode
.Descendants("a")
.Where(x => x.Attributes["class"] != null
&& x.Attributes["class"].Value == "linked formlink")
.Select(x => x.Attributes["href"].Value.ToString());
正则表达式正在删除括号以及 html-agile-pack 用来确定标签和 类
的其他必要字符只需删除它