不能使用 XPath select 具有多个属性的元素
Can't select element with multiple attributes using XPath
正在尝试解析 news.google
<a target="_blank"class="article usg-AFQjCNFr5aujpYnTzdHNYfHZw_gNN6iq-w sig2-1esugE2Sy8Bhe2CzulGmsA did--5114870031117960448 esc-thumbnail-link" href="http://www.theglobeandmail.com/news/world/trump-blasts-media-in-rally-celebrating-100-days-as-president/article34858356/" url="http://www.theglobeandmail.com/news/world/trump-blasts-media-in-rally-celebrating-100-days-as-president/article34858356/" id="MAA4AEgAUABgAWoCY2E" ssid="h" >
我想要 url 属性。我无法获得 url 属性。我得到的都是空引用。
用于查找此多属性元素的 XPath:
HtmlNode aNodes = doc.DocumentNode.SelectSingleNode("//a[@target='_blank' and @class='article usg-AFQjCNFr5aujpYnTzdHNYfHZw_gNN6iq-w sig2-1esugE2Sy8Bhe2CzulGmsA did--5114870031117960448 esc-thumbnail-link' and @href='http://www.theglobeandmail.com/news/world/trump-blasts-media-in-rally-celebrating-100-days-as-president/article34858356/' and @url='http://www.theglobeandmail.com/news/world/trump-blasts-media-in-rally-celebrating-100-days-as-president/article34858356/' and @id='MAA4AEgAUABgAWoCY2E' and @ssid='h']");
我在尝试查找此元素时得到一个空引用。
url 和 href 等属性值始终在变化。有没有办法根据元素中的属性而不是属性值来获取 url?就像如果一个元素具有这五个属性,那么 select 的值 url?非常感谢。
是的,可以通过 存在 属性而不是特定属性 values 来 select 元素:
测试HTML:
var html = @"
<!-- match -->
<a target='_blank'class='article usg-AFQjCNFr5aujpYnTzdHNYfHZw_gNN6iq-w sig2-1esugE2Sy8Bhe2CzulGmsA did--5114870031117960448 esc-thumbnail-link' href='http://www.theglobeandmail.com/news/world/trump-blasts-media-in-rally-celebrating-100-days-as-president/article34858356/' url='http://www.theglobeandmail.com/news/world/trump-blasts-media-in-rally-celebrating-100-days-as-president/article34858356/' id='MAA4AEgAUABgAWoCY2E' ssid='h' ></a>
<!-- NO match, missing url -->
<a target='_blank' href='NO MATCH'' ssid='' id='' class=''></a>
<!-- match -->
<a target='_blank' href='#' ssid='' id='' class='' url='MATCH'><a/>
<!-- NO match, missing multiple wanted attributes -->
<a target='_blank' href='#' url='NO MATCH'></a>
";
和一点 LINQ:
HtmlDocument document = new HtmlDocument();
document.LoadHtml(html);
var wantedLinks = from a in document.DocumentNode.SelectNodes("//a")
where a.Attributes["url"] != null
&& a.Attributes["ssid"] != null
&& a.Attributes["href"] != null
&& a.Attributes["id"] != null
&& a.Attributes["class"] != null
&& a.Attributes["target"] != null
select a;
foreach (var a in wantedLinks)
{
Console.WriteLine(a.Attributes["url"].Value);
}
输出 - 注意缺少所有六个属性的链接被跳过:
http://www.theglobeandmail.com/news/world/trump-blasts-media-in-rally-celebrating-100-days-as-president/article34858356/
MATCH
正在尝试解析 news.google
<a target="_blank"class="article usg-AFQjCNFr5aujpYnTzdHNYfHZw_gNN6iq-w sig2-1esugE2Sy8Bhe2CzulGmsA did--5114870031117960448 esc-thumbnail-link" href="http://www.theglobeandmail.com/news/world/trump-blasts-media-in-rally-celebrating-100-days-as-president/article34858356/" url="http://www.theglobeandmail.com/news/world/trump-blasts-media-in-rally-celebrating-100-days-as-president/article34858356/" id="MAA4AEgAUABgAWoCY2E" ssid="h" >
我想要 url 属性。我无法获得 url 属性。我得到的都是空引用。
用于查找此多属性元素的 XPath:
HtmlNode aNodes = doc.DocumentNode.SelectSingleNode("//a[@target='_blank' and @class='article usg-AFQjCNFr5aujpYnTzdHNYfHZw_gNN6iq-w sig2-1esugE2Sy8Bhe2CzulGmsA did--5114870031117960448 esc-thumbnail-link' and @href='http://www.theglobeandmail.com/news/world/trump-blasts-media-in-rally-celebrating-100-days-as-president/article34858356/' and @url='http://www.theglobeandmail.com/news/world/trump-blasts-media-in-rally-celebrating-100-days-as-president/article34858356/' and @id='MAA4AEgAUABgAWoCY2E' and @ssid='h']");
我在尝试查找此元素时得到一个空引用。 url 和 href 等属性值始终在变化。有没有办法根据元素中的属性而不是属性值来获取 url?就像如果一个元素具有这五个属性,那么 select 的值 url?非常感谢。
是的,可以通过 存在 属性而不是特定属性 values 来 select 元素:
测试HTML:
var html = @"
<!-- match -->
<a target='_blank'class='article usg-AFQjCNFr5aujpYnTzdHNYfHZw_gNN6iq-w sig2-1esugE2Sy8Bhe2CzulGmsA did--5114870031117960448 esc-thumbnail-link' href='http://www.theglobeandmail.com/news/world/trump-blasts-media-in-rally-celebrating-100-days-as-president/article34858356/' url='http://www.theglobeandmail.com/news/world/trump-blasts-media-in-rally-celebrating-100-days-as-president/article34858356/' id='MAA4AEgAUABgAWoCY2E' ssid='h' ></a>
<!-- NO match, missing url -->
<a target='_blank' href='NO MATCH'' ssid='' id='' class=''></a>
<!-- match -->
<a target='_blank' href='#' ssid='' id='' class='' url='MATCH'><a/>
<!-- NO match, missing multiple wanted attributes -->
<a target='_blank' href='#' url='NO MATCH'></a>
";
和一点 LINQ:
HtmlDocument document = new HtmlDocument();
document.LoadHtml(html);
var wantedLinks = from a in document.DocumentNode.SelectNodes("//a")
where a.Attributes["url"] != null
&& a.Attributes["ssid"] != null
&& a.Attributes["href"] != null
&& a.Attributes["id"] != null
&& a.Attributes["class"] != null
&& a.Attributes["target"] != null
select a;
foreach (var a in wantedLinks)
{
Console.WriteLine(a.Attributes["url"].Value);
}
输出 - 注意缺少所有六个属性的链接被跳过:
http://www.theglobeandmail.com/news/world/trump-blasts-media-in-rally-celebrating-100-days-as-president/article34858356/
MATCH