从 html 网页的标签元素中提取 "for" 属性

Extracting the "for" attribute from a label element in an html webpage

我有一些代码可以在单击网页的某些部分时分析该网页的各种属性。被选中的元素之一是被点击元素的 ID。

但有时没有 ID,而是被单击的元素是使用 "for" 属性引用 ID 的标签。在这些情况下,我想获取 "for" 属性值。

我已尝试按如下方式执行此操作:

txtID.Text = TryCast(myHTMLDocument, HtmlDocument).GetElementFromPoint(lastMousePos).GetAttribute("id")
If txtID.Text = "" Then
  txtID.Text = TryCast(myHTMLDocument, HtmlDocument).GetElementFromPoint(lastMousePos).GetAttribute("for")
End If

出于某种原因 .GetAttribute("for") 总是 returns 空白。我是不是错误地引用了这个属性 - 还是发生了其他事情。

HTML 示例如下:

<div class="question legal-owner active">

<a class="help-trigger help-trigger-layout">
    <span class="help-text-icon"></span>
</a>


<div class="quote-help quote-help-layout">
    <a class="quote-help-close-container">
        <div class="quote-help-close"></div>
    </a>
    <h3>Car ownership</h3>

    <p>
        We need to know whether the car belongs to you. If you don’t own the car but you’re the registered keeper, you should answer ‘No’ 
        (the owner of the car and the registered keeper can be different people).
    </p>

</div>

<span class="editor-label question-layout">
    <label for="OwningAndUsingCarPanel_LegalOwner">Are you (or will you be) the legal owner of this car?</label>
</span> 
    <ul class="question-layout yesno-radio-list">
        <li>
            <input name="OwningAndUsingCarPanel.LegalOwner" id="OwningAndUsingCarPanel_LegalOwner_true" type="radio" value="True">
            <label for="OwningAndUsingCarPanel_LegalOwner_true">
                <span>Yes</span>
            </label>
        </li>
        <li>
            <input name="OwningAndUsingCarPanel.LegalOwner" id="OwningAndUsingCarPanel_LegalOwner_false" type="radio" value="False">
            <label for="OwningAndUsingCarPanel_LegalOwner_false">
                <span>No</span>
            </label>
        </li>
    </ul>
<span class="editor-validation">
    <span class="field-validation-valid" id="OwningAndUsingCarPanel_LegalOwner_validationMessage"></span>
</span>
</div>

我已经解决了这个问题,方法是创建我自己的名为 getUnknown 的函数来搜索标签内的属性。这应该适用于其值被双引号括起来的任何属性。该函数有 2 个参数,第一个是一个字符串,它应该包含带有属性和值的元素标记,第二个是您要为其提取值的属性。

Private Function getUnknown(myText As String, myAttr As String)
    Dim myResult As String = ""
    Dim myStart As Integer = 0
    Dim myLen As Integer = 0
    'remove any spaces around the "=" sign
    Dim myCleanText As String = Regex.Replace(myText, "\s+([=])\s+|\s+([=])|([=])\s+", "=")
    'add =" to the attribute to avoid finding non-attributes when using IndexOf function
    Dim myFullAttr As String = myAttr.Trim().ToLower + "="""

    Try
        myStart = myCleanText.ToLower().IndexOf(myFullAttr)
        If myStart = -1 Then
            myResult = "Nothing Found"
        Else
            myStart = myStart + myFullAttr.Length
            myLen = myCleanText.IndexOf("""", myStart) - myStart
            myResult = myCleanText.Substring(myStart, myLen)
        End If
    Catch ex As Exception
        myResult = "Nothing Found"
    End Try

    Return myResult

End Function

在我原来的问题的上下文中,我按如下方式使用了它

Dim myElement As String = _
 TryCast(myHTMLDocument, HtmlDocument).GetElementFromPoint(lastMousePos).OuterHtml.ToString

txtID.Text = getUnknown(myElement, "for")