HTML Agility Pack 无法从 div 获取文本内容
HTML Agility Pack cant get text content from div
我是 C# 的新手,想尝试用它制作一个小刮刀来尝试一些东西。我在上面看到了一个 YT 视频。我正在尝试抓取 bet365.dk(更具体地说,这个 link:https://www.bet365.dk/#/AC/B1/C1/D451/F2/Q1/F^12/)。
这是我的代码:
using System;
using System.Net.Http;
using HtmlAgilityPack;
namespace Bet365Scraper
{
class Program
{
static void Main(string[] args)
{
GetHtmlAsync();
Console.ReadLine();
}
private static async void GetHtmlAsync()
{
var url = "https://www.bet365.dk/#/AC/B1/C1/D451/F2/Q1/F^12/";
var httpClient = new HttpClient();
httpClient.DefaultRequestHeaders.Add("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.75 Safari/537.36");
var html = await httpClient.GetStringAsync(url);
var htmlDocument = new HtmlDocument();
htmlDocument.LoadHtml(html);
var htmlBody = htmlDocument.DocumentNode.SelectSingleNode("//body");
var node = htmlBody.Element("//div[@class='src-ParticipantFixtureDetailsHigher_TeamNames ']");
Console.WriteLine(node.InnerHtml);
}
}
}
我不知道该怎么做。我发现 HTML Agilty Pack 网站上的文档有点令人困惑,而且我似乎无法找到我正在寻找的内容。这就是我想要做的。 bet365 网站上 HTML 的这一小块:
<div class="src-ParticipantFixtureDetailsHigher_TeamNames">
<div class="src-ParticipantFixtureDetailsHigher_TeamWrapper ">
<div class="src-ParticipantFixtureDetailsHigher_Team " style="">Færøerne</div>
</div>
<div class="src-ParticipantFixtureDetailsHigher_TeamWrapper ">
<div class="src-ParticipantFixtureDetailsHigher_Team ">Andorra</div>
</div>
</div>
我怎么能一次从 div 中打印出 'Færørne' 和 'Andorra'?我知道我可能需要使用 foreach,但如前所述,我不太确定如何处理选择器等。
我不熟悉 XPath 但我知道 JS 查询语法,建议另外安装 Fizzler.Systems.HtmlAgilityPack
NuGet 包。
那么HtmlNode.QuerySelector()
方法就可用了。它接受 JavaScript 查询语法。
我还修复了 HttpClient
用法。
namespace Bet365Scraper
{
class Program
{
private static readonly HttpClient httpClient = new HttpClient();
static async Task Main(string[] args)
{
httpClient.DefaultRequestHeaders.UserAgent.ParseAdd("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.75 Safari/537.36");
await GetHtmlAsync("https://www.bet365.dk/#/AC/B1/C1/D451/F2/Q1/F^12/");
Console.ReadLine();
}
private static async Task GetHtmlAsync(string url)
{
var html = await httpClient.GetStringAsync(url);
var htmlDocument = new HtmlDocument();
htmlDocument.LoadHtml(html);
var nodes = htmlDocument.DocumentNode.QuerySelectorAll(".src-ParticipantFixtureDetailsHigher_Team");
foreach (HtmlNode node in nodes)
{
Console.WriteLine(node.InnerText);
}
}
}
}
我是 C# 的新手,想尝试用它制作一个小刮刀来尝试一些东西。我在上面看到了一个 YT 视频。我正在尝试抓取 bet365.dk(更具体地说,这个 link:https://www.bet365.dk/#/AC/B1/C1/D451/F2/Q1/F^12/)。
这是我的代码:
using System;
using System.Net.Http;
using HtmlAgilityPack;
namespace Bet365Scraper
{
class Program
{
static void Main(string[] args)
{
GetHtmlAsync();
Console.ReadLine();
}
private static async void GetHtmlAsync()
{
var url = "https://www.bet365.dk/#/AC/B1/C1/D451/F2/Q1/F^12/";
var httpClient = new HttpClient();
httpClient.DefaultRequestHeaders.Add("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.75 Safari/537.36");
var html = await httpClient.GetStringAsync(url);
var htmlDocument = new HtmlDocument();
htmlDocument.LoadHtml(html);
var htmlBody = htmlDocument.DocumentNode.SelectSingleNode("//body");
var node = htmlBody.Element("//div[@class='src-ParticipantFixtureDetailsHigher_TeamNames ']");
Console.WriteLine(node.InnerHtml);
}
}
}
我不知道该怎么做。我发现 HTML Agilty Pack 网站上的文档有点令人困惑,而且我似乎无法找到我正在寻找的内容。这就是我想要做的。 bet365 网站上 HTML 的这一小块:
<div class="src-ParticipantFixtureDetailsHigher_TeamNames">
<div class="src-ParticipantFixtureDetailsHigher_TeamWrapper ">
<div class="src-ParticipantFixtureDetailsHigher_Team " style="">Færøerne</div>
</div>
<div class="src-ParticipantFixtureDetailsHigher_TeamWrapper ">
<div class="src-ParticipantFixtureDetailsHigher_Team ">Andorra</div>
</div>
</div>
我怎么能一次从 div 中打印出 'Færørne' 和 'Andorra'?我知道我可能需要使用 foreach,但如前所述,我不太确定如何处理选择器等。
我不熟悉 XPath 但我知道 JS 查询语法,建议另外安装 Fizzler.Systems.HtmlAgilityPack
NuGet 包。
那么HtmlNode.QuerySelector()
方法就可用了。它接受 JavaScript 查询语法。
我还修复了 HttpClient
用法。
namespace Bet365Scraper
{
class Program
{
private static readonly HttpClient httpClient = new HttpClient();
static async Task Main(string[] args)
{
httpClient.DefaultRequestHeaders.UserAgent.ParseAdd("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.75 Safari/537.36");
await GetHtmlAsync("https://www.bet365.dk/#/AC/B1/C1/D451/F2/Q1/F^12/");
Console.ReadLine();
}
private static async Task GetHtmlAsync(string url)
{
var html = await httpClient.GetStringAsync(url);
var htmlDocument = new HtmlDocument();
htmlDocument.LoadHtml(html);
var nodes = htmlDocument.DocumentNode.QuerySelectorAll(".src-ParticipantFixtureDetailsHigher_Team");
foreach (HtmlNode node in nodes)
{
Console.WriteLine(node.InnerText);
}
}
}
}