从 Html Img 标签中检索 Url

Retrive the Url from an Html Img Tag

背景信息

目前正在开发 C# web api,它将返回选定的 Img url 作为 base64。我目前具有执行 base64 转换的功能,但是,我收到大量文本,其中还包括 Img Url,我需要从字符串中裁剪出这些文本并将其提供给我的函数进行转换img 以 64 为基数。我阅读了一个 lib.("HtmlAgilityPack;"),它应该可以使这项任务变得容易,但是当我使用它时,我找不到 "HtmlDocument.cs"。但是,我不是提交文件,而是发送一个 HTML 的字符串。我阅读了文档,它应该也可以使用字符串,但它对我不起作用。这是使用 "HtmlAgilityPack".

的代码

非工作代码

foreach(var item in returnList)
                    {
                         if (item.Content.Contains("~~/picture~~"))
                        {
                            HtmlDocument doc = new HtmlDocument();
                            doc.Load(item.Content);

来自 HtmlAgilityPack

的错误消息

问题 我从 SharePoint 收到一个 Html 的字符串。这个 Html 字符串可以用标题标记 and/or 图片标记进行标记。我正在尝试从 img src Hmtl 标记中分离检索 html。我知道正则表达式可能不切实际,但我会考虑使用正则表达式是否可以从 img src 检索 url 。

示例字符串

Bullet~~Increased Cash Flow</li><li>~~/Document Text Bullet~~Tax Efficient Organizational Structures</li><li>~~/Document Text Bullet~~Tax Strategies that Closely Align with Business Strategies</li><li>~~/Document Text Bullet~~Complete Knowledge of State and Local Tax Obligations</li></ul><p>~~/Document Heading 2~~is the firm of choice</p><p>~~/Document Text~~When it comes to accounting and advisory services is the unique firm of choice. As a trusted advisor to our clients, we bring an integrated client service approach with dedicated industry experience. Dixon Hughes Goodman respects the value of every client relationship and provides clients throughout the U.S. with an unwavering commitment to hands-on, personal attention from our partners and senior-level professionals.</p><p>~~/Document Text~~of choice for clients in search of a trusted advisor to deal with their state and local tax needs. Through our leading best practices and experience, our SALT professionals offer quality and ease to the client engagement. We are proud to provide highly comprehensive services.</p>

    <p>~~/picture~~<br></p><p> 
          <img src="/sites/ContentCenter/Graphics/map-al.jpg" alt="map al" style="width&#58;611px;height&#58;262px;" />&#160;
    <br></p><p><br></p><p>
    ~~/picture~~<br></p><p>
          <img src="/sites/ContentCenter/Graphics/Firm_Telescope_Illustration.jpg" alt="Firm_Telescope_Illustration.jpg" style="margin&#58;5px;width&#58;155px;height&#58;155px;" />    </p><p></div><div class="ExternalClassAF0833CB235F437993D7BEE362A1A88A"><br></div><div class="ExternalClassAF0833CB235F437993D7BEE362A1A88A"><br></div><div class="ExternalClassAF0833CB235F437993D7BEE362A1A88A"><br></div>

重要

我正在使用 HTML 字符串,而不是文件。

string matchString = Regex.Match(original_text, "<img.+?src=[\"'](.+?)[\"'].+?>", RegexOptions.IgnoreCase).Groups[1].Value;

已被多次询问here

还有here

您遇到的问题是 C# 正在寻找一个文件,但由于没有找到,它会告诉您。这不是一个会阻止您的应用程序的错误,它只是告诉您找不到该文件,然后库将读取给定的字符串。可在此处 https://htmlagilitypack.codeplex.com/SourceControl/latest#Trunk/HtmlAgilityPackDocumentation.shfbproj 找到此文档。下面的代码是一个千篇一律的模型,任何人都可以使用。

重要

C# 正在查找无法显示的文件,因为它是提供的字符串。这就是您收到的消息,但是您仍然可以根据提供的文档正常工作,并且不会影响您的代码。

示例代码

HtmlAgilityPack.HtmlDocument htmlDocument = new HtmlAgilityPack.HtmlDocument();
htmlDocument.LoadHtml("YourContent"); // can be a string or can be a path.

HtmlAttribute att = url.Attributes["src"];
Uri imgUrl = new System.Uri("Url"+ att.Value); // build your url