如何获取在 API in ASP.NET Core 中返回的网页正文内容
How to get the body content of a web page returned in an API in ASP.NET Core
来自 API 的响应是包含完整 HTML 和 CSS 内容的网页。我只想要正文里的内容。
如何从网页中提取正文内容?
以下是网页的简短版本。页面很长我不能post这里的所有内容。
我要提取的正文内容是“嗨,约翰,Doe 祝你周年快乐,希望我们 FCMB 的所有人都祝你一样,祝贺你的周年纪念日”
<!DOCTYPE html>
<html>
<head>
<style>
body {padding: 0; margin: 0; font-family: sans-serif;}
.general-container {min-height: 100vh; border-radius: 6px; }
</style>
</head>
<body>
<div class="modal fade" id="CustomerPreviewMsg" tabindex="-1" role="dialog" aria-labelledby="exampleModalCenterTitle" aria-hidden="true">
<div class="modal-dialog modal-dialog-centered" role="document">
<div class="modal-content">
<div class="modal-header">
<button type="button" class="close" data-dismiss="modal" aria-label="Close">
<span aria-hidden="true">×</span>
</button>
</div>
<div class="modal-content">
<div class="modal-body mb-0 p-0">
<div class="row mx-0 col-12 profile-pic-container">
<p class="pt-3">
Hi John, Doe wishes you a happy anniversary and wants all of us at FCMB to wish you same, Congratulations on your anniversary Doe
</p>
</div>
</div>
</div>
</div>
</div>
</div>
</body>
</html>
<script src="/Scripts/jquery-3.3.1.js"></script>
<script src="/JsFile/MainJs.js"></script>
<script src="https://unpkg.com/wavesurfer.js"></script>
这是使用端点的代码
var client = new RestClient(appSettings.ShoutOutPreviewUrl + previewMessage.MessageHistoryId);
client.AddDefaultHeader("Authorization", string.Format("Bearer {0}", appSettings.ShoutOutToken));
client.Timeout = -1;
var request = new RestRequest(Method.GET);
request.AddHeader("Content-Type", "text/plain");
IRestResponse response = await client.ExecuteAsync(request);
IRestResponse<string> res = client.Execute<string>(request);
return res.Content;
经过一番挖掘,我使用了 HtmlAgilityPack 来获取节点 https://html-agility-pack.net/
我通过 nuget
安装
internal string ParseHtml(string Html)
{
var doc = new HtmlDocument();
doc.LoadHtml(Html);
var htmlNodes = doc.DocumentNode.SelectSingleNode("//p[@class='pt-3']");
string rawText = htmlNodes.InnerText.Trim();
return rawText;
}
来自 API 的响应是包含完整 HTML 和 CSS 内容的网页。我只想要正文里的内容。
如何从网页中提取正文内容?
以下是网页的简短版本。页面很长我不能post这里的所有内容。
我要提取的正文内容是“嗨,约翰,Doe 祝你周年快乐,希望我们 FCMB 的所有人都祝你一样,祝贺你的周年纪念日”
<!DOCTYPE html>
<html>
<head>
<style>
body {padding: 0; margin: 0; font-family: sans-serif;}
.general-container {min-height: 100vh; border-radius: 6px; }
</style>
</head>
<body>
<div class="modal fade" id="CustomerPreviewMsg" tabindex="-1" role="dialog" aria-labelledby="exampleModalCenterTitle" aria-hidden="true">
<div class="modal-dialog modal-dialog-centered" role="document">
<div class="modal-content">
<div class="modal-header">
<button type="button" class="close" data-dismiss="modal" aria-label="Close">
<span aria-hidden="true">×</span>
</button>
</div>
<div class="modal-content">
<div class="modal-body mb-0 p-0">
<div class="row mx-0 col-12 profile-pic-container">
<p class="pt-3">
Hi John, Doe wishes you a happy anniversary and wants all of us at FCMB to wish you same, Congratulations on your anniversary Doe
</p>
</div>
</div>
</div>
</div>
</div>
</div>
</body>
</html>
<script src="/Scripts/jquery-3.3.1.js"></script>
<script src="/JsFile/MainJs.js"></script>
<script src="https://unpkg.com/wavesurfer.js"></script>
这是使用端点的代码
var client = new RestClient(appSettings.ShoutOutPreviewUrl + previewMessage.MessageHistoryId);
client.AddDefaultHeader("Authorization", string.Format("Bearer {0}", appSettings.ShoutOutToken));
client.Timeout = -1;
var request = new RestRequest(Method.GET);
request.AddHeader("Content-Type", "text/plain");
IRestResponse response = await client.ExecuteAsync(request);
IRestResponse<string> res = client.Execute<string>(request);
return res.Content;
经过一番挖掘,我使用了 HtmlAgilityPack 来获取节点 https://html-agility-pack.net/ 我通过 nuget
安装internal string ParseHtml(string Html)
{
var doc = new HtmlDocument();
doc.LoadHtml(Html);
var htmlNodes = doc.DocumentNode.SelectSingleNode("//p[@class='pt-3']");
string rawText = htmlNodes.InnerText.Trim();
return rawText;
}