获取所有包含“/product/”的链接
Get all links containing "/product/"
我想获取所有包含 /product/
的链接。有 17 个链接包含 /product/
。怎么做?
这行好像有问题
Dim srcs = From iframeNode In htmlDoc.DocumentNode.SelectNodes("//a[@href]")
Select iframeNode.Attributes("href").Value
如何添加参数过滤/product/
?
这是我目前的情况:
Imports HtmlAgilityPack
Module Module1
Sub Main()
Dim mainUrl As String = "https://www.nordicwater.com/products/waste-water/"
Dim htmlDoc As New HtmlAgilityPack.HtmlDocument
htmlDoc.LoadHtml(mainUrl)
Dim srcs = From iframeNode In htmlDoc.DocumentNode.SelectNodes("//a[@href]")
Select iframeNode.Attributes("href").Value
'print all the src you got
For Each src In srcs
Console.WriteLine(src)
Next
End Sub
End Module
编辑:
工作解决方案:
Imports HtmlAgilityPack
Module Module1
Sub Main()
Dim mainUrl As String = "https://www.nordicwater.com/products/waste-water/"
Dim htmlDoc As HtmlDocument = New HtmlWeb().Load(mainUrl) '< - - - Load the webage into htmldocument
Dim srcs As HtmlNodeCollection = htmlDoc.DocumentNode.SelectNodes("//ul[@class='products-list-page']//a") '< - - - select nodes with links
For Each src As HtmlNode In srcs
Console.WriteLine(src.Attributes("href").Value) '< - - - Print urls
Next
Console.Read()
End Sub
End Module
您必须先加载网页,然后select您想要打印的节点和属性。
这是一种方法:
Dim mainUrl As String = "https://www.nordicwater.com/products/waste-water/"
Dim htmlDoc As HtmlDocument = New HtmlWeb().Load(mainUrl) '< - - - Load the webage into htmldocument
Dim srcs As HtmlNodeCollection = htmlDoc.DocumentNode.SelectNodes("//ul[@class='products-list-page']//a") '< - - - select nodes with links
For Each src As HtmlNode In srcs
Console.WriteLine(src.Attributes("href").Value) '< - - - Print urls
Next
您需要学习调试,如果您检查过代码,您会发现您将 "htmlDoc" html 设置为 url 字符串,而不是加载实际网页html.
我想获取所有包含 /product/
的链接。有 17 个链接包含 /product/
。怎么做?
这行好像有问题
Dim srcs = From iframeNode In htmlDoc.DocumentNode.SelectNodes("//a[@href]")
Select iframeNode.Attributes("href").Value
如何添加参数过滤/product/
?
这是我目前的情况:
Imports HtmlAgilityPack
Module Module1
Sub Main()
Dim mainUrl As String = "https://www.nordicwater.com/products/waste-water/"
Dim htmlDoc As New HtmlAgilityPack.HtmlDocument
htmlDoc.LoadHtml(mainUrl)
Dim srcs = From iframeNode In htmlDoc.DocumentNode.SelectNodes("//a[@href]")
Select iframeNode.Attributes("href").Value
'print all the src you got
For Each src In srcs
Console.WriteLine(src)
Next
End Sub
End Module
编辑:
工作解决方案:
Imports HtmlAgilityPack
Module Module1
Sub Main()
Dim mainUrl As String = "https://www.nordicwater.com/products/waste-water/"
Dim htmlDoc As HtmlDocument = New HtmlWeb().Load(mainUrl) '< - - - Load the webage into htmldocument
Dim srcs As HtmlNodeCollection = htmlDoc.DocumentNode.SelectNodes("//ul[@class='products-list-page']//a") '< - - - select nodes with links
For Each src As HtmlNode In srcs
Console.WriteLine(src.Attributes("href").Value) '< - - - Print urls
Next
Console.Read()
End Sub
End Module
您必须先加载网页,然后select您想要打印的节点和属性。
这是一种方法:
Dim mainUrl As String = "https://www.nordicwater.com/products/waste-water/"
Dim htmlDoc As HtmlDocument = New HtmlWeb().Load(mainUrl) '< - - - Load the webage into htmldocument
Dim srcs As HtmlNodeCollection = htmlDoc.DocumentNode.SelectNodes("//ul[@class='products-list-page']//a") '< - - - select nodes with links
For Each src As HtmlNode In srcs
Console.WriteLine(src.Attributes("href").Value) '< - - - Print urls
Next
您需要学习调试,如果您检查过代码,您会发现您将 "htmlDoc" html 设置为 url 字符串,而不是加载实际网页html.