如何获取在 API in ASP.NET Core 中返回的网页正文内容

How to get the body content of a web page returned in an API in ASP.NET Core

来自 API 的响应是包含完整 HTML 和 CSS 内容的网页。我只想要正文里的内容。

如何从网页中提取正文内容?

以下是网页的简短版本。页面很长我不能post这里的所有内容。

我要提取的正文内容是“嗨,约翰,Doe 祝你周年快乐,希望我们 FCMB 的所有人都祝你一样,祝贺你的周年纪念日”

<!DOCTYPE html>
<html>
<head>
    <style>
        body {padding: 0; margin: 0; font-family: sans-serif;}
        .general-container {min-height: 100vh; border-radius: 6px; }
    </style>
</head>
<body>
    <div class="modal fade" id="CustomerPreviewMsg" tabindex="-1" role="dialog" aria-labelledby="exampleModalCenterTitle" aria-hidden="true">
        <div class="modal-dialog modal-dialog-centered" role="document">
            <div class="modal-content">
                <div class="modal-header">
                    <button type="button" class="close" data-dismiss="modal" aria-label="Close">
                        <span aria-hidden="true">&times;</span>
                    </button>
                </div>
                <div class="modal-content">
                    <div class="modal-body mb-0 p-0">
                        <div class="row mx-0 col-12 profile-pic-container">
                            <p class="pt-3">
                                Hi John, Doe wishes you a happy anniversary and wants all of us at FCMB to wish you same, Congratulations on your anniversary Doe
                            </p>
                        </div>
                    </div>
                </div>
            </div>
        </div>
    </div>
</body>
</html>
<script src="/Scripts/jquery-3.3.1.js"></script>
<script src="/JsFile/MainJs.js"></script>
<script src="https://unpkg.com/wavesurfer.js"></script>

这是使用端点的代码

var client = new RestClient(appSettings.ShoutOutPreviewUrl + previewMessage.MessageHistoryId);
client.AddDefaultHeader("Authorization", string.Format("Bearer {0}", appSettings.ShoutOutToken));
client.Timeout = -1;
var request = new RestRequest(Method.GET);
request.AddHeader("Content-Type", "text/plain");

IRestResponse response = await client.ExecuteAsync(request);
IRestResponse<string> res = client.Execute<string>(request);

return res.Content;

经过一番挖掘,我使用了 HtmlAgilityPack 来获取节点 https://html-agility-pack.net/ 我通过 nuget

安装
internal string ParseHtml(string Html)
        {
            var doc = new HtmlDocument();
            doc.LoadHtml(Html);

            var htmlNodes = doc.DocumentNode.SelectSingleNode("//p[@class='pt-3']");

            string rawText = htmlNodes.InnerText.Trim();

            return rawText;
        }