在 C# 中用 "HTML Agility Pack" 解析未标记为 HTML
Parsing not labeled HTML with "HTML Agility Pack" in C#
使用 HTML Agility Pack,我想解析 HTML 文档中未标记的文本。
下一个 HTML 是我将处理的 HTML 结构的示例,最后一个 div 下面的文本是我要提取的文本示例。
(以 "I am selling..." 开头并以“...服务或优惠”结尾的那个)
<div class="mapbox">
<div id="map" class="viewposting" data-latitude="32.965732" data-longitude="-96.882528" data-accuracy="22"></div>
<p class="mapaddress">
<small>
(<a target="_blank" href="https://maps.google.com/maps/preview/@32.965732,-96.882528,16z">google map</a>)
</small>
</p>
</div>
<p class="attrgroup">
<span><b>2012 jeep grand cherokee laredo</b></span>
<br>
</p>
<p class="attrgroup">
<span>VIN: <b>ask me</b></span>
<br>
<span>condition: <b>excellent</b></span>
<br>
<span>cylinders: <b>6 cylinders</b></span>
<br>
<span>drive: <b>rwd</b></span>
<br>
<span>fuel: <b>gas</b></span>
<br>
<span>odometer: <b>98000</b></span>
<br>
<span>title status: <b>clean</b></span>
<br>
<span>transmission: <b>automatic</b></span>
<br>
</p>
<div class="print-information print-qrcode-container">
<p class="print-qrcode-label">QR Code Link to This Post</p>
<div class="print-qrcode" data-location="east"></div>
</div>
I am selling my 2012 Jeep Grand Cherokee. The Jeep runs and drives great. Zero issues. Always been well maintained and serviced on time. Very dependable car has never left me stranded. Very healthy. Everything works like it should. This Grand Cherokee would make a great family car or First car.<br>
<br>
*3.6 V6 <br>
*Automatic Transmission <br>
*98,000 Original Miles<br>
*Leather and Heated Seats<br>
*Navigation<br>
*Back Up Camera <br>
*Good Tires<br>
*Cold A/C Hot Heater <br>
*Clean Texas Title<br>
*Clean Carfax<br>
Much More!!<br>
<br>
Call or Text me for anymore information. <br>
<a href="/fb/dal/cto/6620220745" class="showcontact" title="click to show contact info" rel="nofollow">show contact info</a>
<li>do NOT contact me with unsolicited services or offers</li>
谁能告诉我该怎么做?如何在 .NET 中使用 HTML Agility Pack 提取该文本?
提前致谢
加载文档后,使用 xpath 选择特定节点后的文本。
const string xpath = "//div[@class='print-information print-qrcode-container']/following-sibling::text()[1]";
string text = doc.DocumentNode.SelectSingleNode(xpath).InnerText;
returns:
I am selling my 2012 Jeep Grand Cherokee. The Jeep runs and drives
great. Zero issues. Always been well maintained and serviced on time.
Very dependable car has never left me stranded. Very healthy.
Everything works like it should. This Grand Cherokee would make a
great family car or First car.
和 visca catalunya!
使用 HTML Agility Pack,我想解析 HTML 文档中未标记的文本。 下一个 HTML 是我将处理的 HTML 结构的示例,最后一个 div 下面的文本是我要提取的文本示例。 (以 "I am selling..." 开头并以“...服务或优惠”结尾的那个)
<div class="mapbox">
<div id="map" class="viewposting" data-latitude="32.965732" data-longitude="-96.882528" data-accuracy="22"></div>
<p class="mapaddress">
<small>
(<a target="_blank" href="https://maps.google.com/maps/preview/@32.965732,-96.882528,16z">google map</a>)
</small>
</p>
</div>
<p class="attrgroup">
<span><b>2012 jeep grand cherokee laredo</b></span>
<br>
</p>
<p class="attrgroup">
<span>VIN: <b>ask me</b></span>
<br>
<span>condition: <b>excellent</b></span>
<br>
<span>cylinders: <b>6 cylinders</b></span>
<br>
<span>drive: <b>rwd</b></span>
<br>
<span>fuel: <b>gas</b></span>
<br>
<span>odometer: <b>98000</b></span>
<br>
<span>title status: <b>clean</b></span>
<br>
<span>transmission: <b>automatic</b></span>
<br>
</p>
<div class="print-information print-qrcode-container">
<p class="print-qrcode-label">QR Code Link to This Post</p>
<div class="print-qrcode" data-location="east"></div>
</div>
I am selling my 2012 Jeep Grand Cherokee. The Jeep runs and drives great. Zero issues. Always been well maintained and serviced on time. Very dependable car has never left me stranded. Very healthy. Everything works like it should. This Grand Cherokee would make a great family car or First car.<br>
<br>
*3.6 V6 <br>
*Automatic Transmission <br>
*98,000 Original Miles<br>
*Leather and Heated Seats<br>
*Navigation<br>
*Back Up Camera <br>
*Good Tires<br>
*Cold A/C Hot Heater <br>
*Clean Texas Title<br>
*Clean Carfax<br>
Much More!!<br>
<br>
Call or Text me for anymore information. <br>
<a href="/fb/dal/cto/6620220745" class="showcontact" title="click to show contact info" rel="nofollow">show contact info</a>
<li>do NOT contact me with unsolicited services or offers</li>
谁能告诉我该怎么做?如何在 .NET 中使用 HTML Agility Pack 提取该文本?
提前致谢
加载文档后,使用 xpath 选择特定节点后的文本。
const string xpath = "//div[@class='print-information print-qrcode-container']/following-sibling::text()[1]";
string text = doc.DocumentNode.SelectSingleNode(xpath).InnerText;
returns:
I am selling my 2012 Jeep Grand Cherokee. The Jeep runs and drives great. Zero issues. Always been well maintained and serviced on time. Very dependable car has never left me stranded. Very healthy. Everything works like it should. This Grand Cherokee would make a great family car or First car.
和 visca catalunya!