使用 HtmlAgilityPack 从网站抓取数据时子节点的文本出现问题

Question

希望有人能帮助这个新手。

我为这个子节点尝试了很多路径，但我无法弄清楚。

Html 部分：

 <div class="center-block"> == [=10=]
    <div class="match-time" id="dvStatusText">MS</div>
    <div class="match-score" id="dvScoreText">4 - 0</div>
    <div class="hf-match-score" id="dvHTScoreText">İY : 3- 0</div>
 </div>

我的代码：

Uri url = new Uri("http://arsiv.mackolik.com/Mac/3213138/");
WebClient client = new WebClient();
client.Encoding = Encoding.UTF8;
try
   {
      string html = client.DownloadString(url);
      HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
      doc.LoadHtml(html);
      HtmlNodeCollection results = doc.DocumentNode.SelectNodes("//*[@class='center-block']"); // 
       if (results != null)
       {
          for (int i = 0; i < results.Count; i++)
           { 
             var t1 = results[i].SelectSingleNode("//*[@class='match-score']").InnerText; // (FT)
             var t2 = results[i].SelectSingleNode("//*[@id='dvHTScoreText']").InnerText; // ht
             listBox1.Items.Add(t2.ToString());
           }
    }

我的问题来自 InnerHtml 结果：

 <div class="match-time" id="dvStatusText">MS</div>
 <div class="match-score" id="dvScoreText">4 - 0</div>
 <div class="hf-match-score" id="dvHTScoreText"></div> // this element has always contains text.

我尝试了不同的方法来解决这个问题，但我一无所获。我可以抓取“class=match time”或“class=match-score”。但我不能“class=hf-match-score”。我试过用 class 或 id 抓取。不同的方式同样的问题。请告诉我一个方法。非常感谢。

Answer 1

半场比分显示为Javascript。您将需要 Selenium 或类似工具来访问此元素。

作为替代方案，您可以直接从后台加载的 JSON 中获取数据。 Python 中的一段代码（我想你可以在 c# 中做同样的事情）：

import requests
from lxml import html

# We set up the download url (obtained in the network tab of the developer tool) and the mandatory header

url = 'http://arsiv.mackolik.com/Match/MatchData.aspx?t=dtl&id=3213138&s=0'
hd = {'Referer': 'http://arsiv.mackolik.com/Mac/3213138/'}

# We download and parse the json

data = requests.get(url,headers=hd)
val= data.json()

# We extract values of interest

print(val["d"]["s"],val["d"]["ht"],sep="\n")

输出：

4 - 0
3 - 0

使用 HtmlAgilityPack 从网站抓取数据时子节点的文本出现问题

Problem with subnode's text when scraping data from a website with HtmlAgilityPack

html

c#

xpath

html-agility-pack