在网络浏览器中获取完整的输入标签字符串

Question

我不得不在一个小应用程序中使用 webbrowser 控件来不填写字段值，而是提取它们，我想做的是获取完整的输入字符串，例如：

<input type="text" name="username" class="form-control" size="40" required="required"/>

我通过使用知道：

        foreach (HtmlElement element in webBrowser.Document.GetElementsByTagName("input"))
        {
            Helpers.ReturnMessage(element.GetAttribute("name"));
        }

我们可以使用上面的代码获取 name="username" 部分的值，但是有没有办法获取整个字符串，在本例中为：

<input type="text" name="username" class="form-control" size="40" required="required"/>

理想情况下，我想要做的是从每个 input -> name="username" 中获取这一部分，在某些示例中它可能是 id="value"，所以我无法对其进行硬编码，或者我需要使用某种正则表达式吗？感谢您的帮助。

Answer 1

好像HtmlElement doesn't provide any capabilities to enumerate attributes(at least in a generic enough way) so the simplest solution will be to use its OuterHtml property and parse it with https://html-agility-pack.net/

var inputHtml = _webBrowser
    .Document
    .GetElementsByTagName("input")
    .Cast<HtmlElement>()
    .Single()
    .OuterHtml;     
var elementHtmlDoc = new HtmlAgilityPack.HtmlDocument();
elementHtmlDoc.LoadHtml(inputHtml);
var attributesDictionary = elementHtmlDoc
    .DocumentNode
    .ChildNodes
    .Single()
    .Attributes
    .ToDictionary(
        attr => attr.Name, 
        attr => attr.Value);
MessageBox.Show(
    String.Join(Environment.NewLine, attributesDictionary),
    "Attributes");

如果您确实需要获取该元素的属性 HTML 字符串，则可以（不理想但仍然 mostly reliable in this case）通过元素的 OuterHtml 上的一些正则表达式来完成

var attributesString = Regex
    .Match(inputHtml, @"^<\s*\S+\s+(?<attributes>[^\>]*)>") // WebBrowser removes closing slash, so we do not need to handle it.
    .Groups["attributes"]
    .ToString();

虽然它不会是 actual HTML 使用的（因为 WebBrowser 似乎重新排列了原始属性并提供了稍微修改的 HTML）。因此，如果您想获得实际的 HTML，那么您将必须分别获得原始的 .html 文件（显然不适用于 SPA 和 Ajax-heavy 站点）并解析它与 HtmlAgilityPack.

在网络浏览器中获取完整的输入标签字符串

Getting complete input tag string in webbrowser

c#

webbrowser-control

winforms