需要示例 HtmlAgilityPack
Need Example HtmlAgilityPack
我再次尝试抓取作为示例。
实际上我有以下代码:
Imports System
Imports System.Xml
Imports HtmlAgilityPack
Imports System.Net
Imports System.IO
Imports System.Collections.Generic
Public Class Program
Public Shared Sub Main()
'Enable SSL Suppport'
ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls12
'WebPage to Scraping'
Dim link As String = "https://www.nextinpact.com"
'download page from the link into an HtmlDocument'
Dim doc As HtmlDocument = New HtmlWeb().Load(link)
'select the title'
Dim div As HtmlNode = doc.DocumentNode.SelectSingleNode("//section[@class='small_article_section']")
If Not div Is Nothing Then
For Each node As HtmlNode In doc.DocumentNode.SelectNodes("//h2[@class='color_title']//a[@class='ui-link'][contains(text())]")
Console.Write(div.InnerText.Trim())
Next
End If
End Sub
End Class
实际上我试图从
中获取所有标题
"//section[@class='small_article_section']"
但是我要怎么做才能获得所有的称号呢?
对于第一个标题,xpath 是
"//h2[@class='color_title']//a[@class='ui-link'][contains(text(),'Les
obligations de Netflix passeront d')]"
谢谢。
编辑:
我试试另一个例子,
和
Dim doc As HtmlDocument = New HtmlWeb().Load("https://www.sideshow.com/collectibles?manufacturer=sideshow+collectibles&type=premium+format%28tm%29+figure&brand=aspen")
Dim div As HtmlNode = doc.DocumentNode.SelectSingleNode("//div[@class='c-ProductList row']")
现在我尝试为每个产品获取标题,其中:
For Each node As HtmlNode In div.SelectNodes("//h2[contains(text(),'Grace')]") 'That is for Only Grace
Console.Write(node.InnerText.Trim())
Next
但是
//h2[contains(text(),'Grace')]
我什么都没有,我想要 Gace 和 Aspen 试试
.//h2[contains(text()]
也没什么
这就是你的做法。
Dim doc As HtmlDocument = New HtmlWeb().Load("https://www.nextinpact.com/")
Dim div As HtmlNode = doc.DocumentNode.SelectSingleNode("//section[@class='small_article_section']")
'If div IsNot Nothing Then 'I think this part is pointless as it will always exist
For Each node As HtmlNode In div.SelectNodes(".//h2[@class='color_title']/a") 'a class='ui-link' doesn't exist so do h2/a
Console.Write(node.InnerText.Trim())
Next
我再次尝试抓取作为示例。
实际上我有以下代码:
Imports System
Imports System.Xml
Imports HtmlAgilityPack
Imports System.Net
Imports System.IO
Imports System.Collections.Generic
Public Class Program
Public Shared Sub Main()
'Enable SSL Suppport'
ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls12
'WebPage to Scraping'
Dim link As String = "https://www.nextinpact.com"
'download page from the link into an HtmlDocument'
Dim doc As HtmlDocument = New HtmlWeb().Load(link)
'select the title'
Dim div As HtmlNode = doc.DocumentNode.SelectSingleNode("//section[@class='small_article_section']")
If Not div Is Nothing Then
For Each node As HtmlNode In doc.DocumentNode.SelectNodes("//h2[@class='color_title']//a[@class='ui-link'][contains(text())]")
Console.Write(div.InnerText.Trim())
Next
End If
End Sub
End Class
实际上我试图从
中获取所有标题"//section[@class='small_article_section']"
但是我要怎么做才能获得所有的称号呢? 对于第一个标题,xpath 是
"//h2[@class='color_title']//a[@class='ui-link'][contains(text(),'Les obligations de Netflix passeront d')]"
谢谢。
编辑: 我试试另一个例子,
和
Dim doc As HtmlDocument = New HtmlWeb().Load("https://www.sideshow.com/collectibles?manufacturer=sideshow+collectibles&type=premium+format%28tm%29+figure&brand=aspen")
Dim div As HtmlNode = doc.DocumentNode.SelectSingleNode("//div[@class='c-ProductList row']")
现在我尝试为每个产品获取标题,其中:
For Each node As HtmlNode In div.SelectNodes("//h2[contains(text(),'Grace')]") 'That is for Only Grace
Console.Write(node.InnerText.Trim())
Next
但是
//h2[contains(text(),'Grace')]
我什么都没有,我想要 Gace 和 Aspen 试试
.//h2[contains(text()]
也没什么
这就是你的做法。
Dim doc As HtmlDocument = New HtmlWeb().Load("https://www.nextinpact.com/")
Dim div As HtmlNode = doc.DocumentNode.SelectSingleNode("//section[@class='small_article_section']")
'If div IsNot Nothing Then 'I think this part is pointless as it will always exist
For Each node As HtmlNode In div.SelectNodes(".//h2[@class='color_title']/a") 'a class='ui-link' doesn't exist so do h2/a
Console.Write(node.InnerText.Trim())
Next