使用 HTMLAgilitypack 获取数据
Using HTMLAgilitypack to get data
<ol class="list-data-b">
<li class="in-ttl-b">(a) kanji; a Chinese character [ideograph]
<ul class="list-data-b-in"><li class="text-jejp text-c"><span class="ex">漢字で書く</span></li><li class="text-jeen text-c">write in <i>kanji</i> [<i>Chinese characters</i>]</li></ul>
<ul class="list-data-b-in"><li class="text-jejp text-c"><span class="ex">常用漢字</span></li><li class="text-jeen text-c"><i>Chinese characters</i> for everyday use (in Japan)</li></ul>
</li>
</ol>
我有HTML这样的,我怎样才能得到一部分数据:
- (a) 汉字;一个汉字[表意文字]
- 汉字den书く
- 写汉字[汉字]
- 常用汉字
- 日常汉字(日本)
这是我的代码。
Dim node2 = HTMLDoc.DocumentNode.SelectNodes("//ul[@class='list-data-b-in']")
If node2 IsNot Nothing Then
For Each node In node2
Dim Japnodes As HtmlAgilityPack.HtmlNode = node.SelectSingleNode("//li[@class='text-jejp text-c']")
txtMean.AppendText(Japnodes.InnerText)
txtMean.AppendText(vbNewLine)
Dim Engnodes As HtmlAgilityPack.HtmlNode = node.SelectSingleNode("//li[@class='text-jeen text-c']")
txtMean.AppendText(Engnodes.InnerText)
txtMean.AppendText(vbNewLine)
Next
可以按照 中的说明选择第一个文本。现在要获取每对 Chinese/Japanese-English 文本,您可以遍历 ul
元素,然后从每个 ul
中获取包含目标文本的两个元素。
这是一个控制台应用程序演示:
Dim lis = HTMLDoc.DocumentNode.SelectNodes("//li[@class='in-ttl-b']")
For Each li As HtmlNode in lis
Dim txt = li.SelectSingleNode("text()[1]")
Console.WriteLine(txt.InnerText)
For Each ul As HtmlNode in li.SelectNodes("ul")
Dim japNode = ul.SelectSingleNode("li/span")
Dim engNode = ul.SelectSingleNode("li[@class='text-jeen text-c']")
Console.WriteLine(japNode.InnerText)
Console.WriteLine(engNode.InnerText)
Next
Next
输出:
(a) kanji; a Chinese character [ideograph]
漢字で書く
write in kanji [Chinese characters]
常用漢字
Chinese characters for everyday use (in Japan)
<ol class="list-data-b">
<li class="in-ttl-b">(a) kanji; a Chinese character [ideograph]
<ul class="list-data-b-in"><li class="text-jejp text-c"><span class="ex">漢字で書く</span></li><li class="text-jeen text-c">write in <i>kanji</i> [<i>Chinese characters</i>]</li></ul>
<ul class="list-data-b-in"><li class="text-jejp text-c"><span class="ex">常用漢字</span></li><li class="text-jeen text-c"><i>Chinese characters</i> for everyday use (in Japan)</li></ul>
</li>
</ol>
我有HTML这样的,我怎样才能得到一部分数据:
- (a) 汉字;一个汉字[表意文字]
- 汉字den书く
- 写汉字[汉字]
- 常用汉字
- 日常汉字(日本)
这是我的代码。
Dim node2 = HTMLDoc.DocumentNode.SelectNodes("//ul[@class='list-data-b-in']")
If node2 IsNot Nothing Then
For Each node In node2
Dim Japnodes As HtmlAgilityPack.HtmlNode = node.SelectSingleNode("//li[@class='text-jejp text-c']")
txtMean.AppendText(Japnodes.InnerText)
txtMean.AppendText(vbNewLine)
Dim Engnodes As HtmlAgilityPack.HtmlNode = node.SelectSingleNode("//li[@class='text-jeen text-c']")
txtMean.AppendText(Engnodes.InnerText)
txtMean.AppendText(vbNewLine)
Next
可以按照 ul
元素,然后从每个 ul
中获取包含目标文本的两个元素。
这是一个控制台应用程序演示:
Dim lis = HTMLDoc.DocumentNode.SelectNodes("//li[@class='in-ttl-b']")
For Each li As HtmlNode in lis
Dim txt = li.SelectSingleNode("text()[1]")
Console.WriteLine(txt.InnerText)
For Each ul As HtmlNode in li.SelectNodes("ul")
Dim japNode = ul.SelectSingleNode("li/span")
Dim engNode = ul.SelectSingleNode("li[@class='text-jeen text-c']")
Console.WriteLine(japNode.InnerText)
Console.WriteLine(engNode.InnerText)
Next
Next
输出:
(a) kanji; a Chinese character [ideograph]
漢字で書く
write in kanji [Chinese characters]
常用漢字
Chinese characters for everyday use (in Japan)