不能使用 XPath select 具有多个属性的元素

Question

正在尝试解析 news.google

<a target="_blank"class="article usg-AFQjCNFr5aujpYnTzdHNYfHZw_gNN6iq-w sig2-1esugE2Sy8Bhe2CzulGmsA did--5114870031117960448 esc-thumbnail-link" href="http://www.theglobeandmail.com/news/world/trump-blasts-media-in-rally-celebrating-100-days-as-president/article34858356/" url="http://www.theglobeandmail.com/news/world/trump-blasts-media-in-rally-celebrating-100-days-as-president/article34858356/" id="MAA4AEgAUABgAWoCY2E"  ssid="h" >

我想要 url 属性。我无法获得 url 属性。我得到的都是空引用。

用于查找此多属性元素的 XPath：

HtmlNode aNodes = doc.DocumentNode.SelectSingleNode("//a[@target='_blank' and @class='article usg-AFQjCNFr5aujpYnTzdHNYfHZw_gNN6iq-w sig2-1esugE2Sy8Bhe2CzulGmsA did--5114870031117960448 esc-thumbnail-link' and @href='http://www.theglobeandmail.com/news/world/trump-blasts-media-in-rally-celebrating-100-days-as-president/article34858356/' and @url='http://www.theglobeandmail.com/news/world/trump-blasts-media-in-rally-celebrating-100-days-as-president/article34858356/' and @id='MAA4AEgAUABgAWoCY2E' and @ssid='h']");

我在尝试查找此元素时得到一个空引用。 url 和 href 等属性值始终在变化。有没有办法根据元素中的属性而不是属性值来获取 url？就像如果一个元素具有这五个属性，那么 select 的值 url？非常感谢。

Answer 1

是的，可以通过存在属性而不是特定属性 values 来 select 元素：

测试HTML：

var html = @"
<!-- match -->
<a target='_blank'class='article usg-AFQjCNFr5aujpYnTzdHNYfHZw_gNN6iq-w sig2-1esugE2Sy8Bhe2CzulGmsA did--5114870031117960448 esc-thumbnail-link' href='http://www.theglobeandmail.com/news/world/trump-blasts-media-in-rally-celebrating-100-days-as-president/article34858356/' url='http://www.theglobeandmail.com/news/world/trump-blasts-media-in-rally-celebrating-100-days-as-president/article34858356/' id='MAA4AEgAUABgAWoCY2E'  ssid='h' ></a>
<!-- NO match, missing url -->
<a target='_blank' href='NO MATCH'' ssid='' id='' class=''></a>
<!-- match -->
<a target='_blank' href='#' ssid='' id='' class='' url='MATCH'><a/>
<!-- NO match, missing multiple wanted attributes -->
<a target='_blank' href='#' url='NO MATCH'></a>
";

和一点 LINQ：

HtmlDocument document = new HtmlDocument();
document.LoadHtml(html);
var wantedLinks = from a in document.DocumentNode.SelectNodes("//a")
    where a.Attributes["url"] != null
    && a.Attributes["ssid"] != null
    && a.Attributes["href"] != null
    && a.Attributes["id"] != null
    && a.Attributes["class"] != null
    && a.Attributes["target"] != null
    select a;

foreach (var a in wantedLinks)
{
    Console.WriteLine(a.Attributes["url"].Value);
}

输出 - 注意缺少所有六个属性的链接被跳过：

http://www.theglobeandmail.com/news/world/trump-blasts-media-in-rally-celebrating-100-days-as-president/article34858356/
MATCH

不能使用 XPath select 具有多个属性的元素

Can't select element with multiple attributes using XPath

html

c#

xpath

html-agility-pack