从 html 网页的标签元素中提取 "for" 属性
Extracting the "for" attribute from a label element in an html webpage
我有一些代码可以在单击网页的某些部分时分析该网页的各种属性。被选中的元素之一是被点击元素的 ID。
但有时没有 ID,而是被单击的元素是使用 "for" 属性引用 ID 的标签。在这些情况下,我想获取 "for" 属性值。
我已尝试按如下方式执行此操作:
txtID.Text = TryCast(myHTMLDocument, HtmlDocument).GetElementFromPoint(lastMousePos).GetAttribute("id")
If txtID.Text = "" Then
txtID.Text = TryCast(myHTMLDocument, HtmlDocument).GetElementFromPoint(lastMousePos).GetAttribute("for")
End If
出于某种原因 .GetAttribute("for")
总是 returns 空白。我是不是错误地引用了这个属性 - 还是发生了其他事情。
HTML 示例如下:
<div class="question legal-owner active">
<a class="help-trigger help-trigger-layout">
<span class="help-text-icon"></span>
</a>
<div class="quote-help quote-help-layout">
<a class="quote-help-close-container">
<div class="quote-help-close"></div>
</a>
<h3>Car ownership</h3>
<p>
We need to know whether the car belongs to you. If you don’t own the car but you’re the registered keeper, you should answer ‘No’
(the owner of the car and the registered keeper can be different people).
</p>
</div>
<span class="editor-label question-layout">
<label for="OwningAndUsingCarPanel_LegalOwner">Are you (or will you be) the legal owner of this car?</label>
</span>
<ul class="question-layout yesno-radio-list">
<li>
<input name="OwningAndUsingCarPanel.LegalOwner" id="OwningAndUsingCarPanel_LegalOwner_true" type="radio" value="True">
<label for="OwningAndUsingCarPanel_LegalOwner_true">
<span>Yes</span>
</label>
</li>
<li>
<input name="OwningAndUsingCarPanel.LegalOwner" id="OwningAndUsingCarPanel_LegalOwner_false" type="radio" value="False">
<label for="OwningAndUsingCarPanel_LegalOwner_false">
<span>No</span>
</label>
</li>
</ul>
<span class="editor-validation">
<span class="field-validation-valid" id="OwningAndUsingCarPanel_LegalOwner_validationMessage"></span>
</span>
</div>
我已经解决了这个问题,方法是创建我自己的名为 getUnknown 的函数来搜索标签内的属性。这应该适用于其值被双引号括起来的任何属性。该函数有 2 个参数,第一个是一个字符串,它应该包含带有属性和值的元素标记,第二个是您要为其提取值的属性。
Private Function getUnknown(myText As String, myAttr As String)
Dim myResult As String = ""
Dim myStart As Integer = 0
Dim myLen As Integer = 0
'remove any spaces around the "=" sign
Dim myCleanText As String = Regex.Replace(myText, "\s+([=])\s+|\s+([=])|([=])\s+", "=")
'add =" to the attribute to avoid finding non-attributes when using IndexOf function
Dim myFullAttr As String = myAttr.Trim().ToLower + "="""
Try
myStart = myCleanText.ToLower().IndexOf(myFullAttr)
If myStart = -1 Then
myResult = "Nothing Found"
Else
myStart = myStart + myFullAttr.Length
myLen = myCleanText.IndexOf("""", myStart) - myStart
myResult = myCleanText.Substring(myStart, myLen)
End If
Catch ex As Exception
myResult = "Nothing Found"
End Try
Return myResult
End Function
在我原来的问题的上下文中,我按如下方式使用了它
Dim myElement As String = _
TryCast(myHTMLDocument, HtmlDocument).GetElementFromPoint(lastMousePos).OuterHtml.ToString
txtID.Text = getUnknown(myElement, "for")
我有一些代码可以在单击网页的某些部分时分析该网页的各种属性。被选中的元素之一是被点击元素的 ID。
但有时没有 ID,而是被单击的元素是使用 "for" 属性引用 ID 的标签。在这些情况下,我想获取 "for" 属性值。
我已尝试按如下方式执行此操作:
txtID.Text = TryCast(myHTMLDocument, HtmlDocument).GetElementFromPoint(lastMousePos).GetAttribute("id")
If txtID.Text = "" Then
txtID.Text = TryCast(myHTMLDocument, HtmlDocument).GetElementFromPoint(lastMousePos).GetAttribute("for")
End If
出于某种原因 .GetAttribute("for")
总是 returns 空白。我是不是错误地引用了这个属性 - 还是发生了其他事情。
HTML 示例如下:
<div class="question legal-owner active">
<a class="help-trigger help-trigger-layout">
<span class="help-text-icon"></span>
</a>
<div class="quote-help quote-help-layout">
<a class="quote-help-close-container">
<div class="quote-help-close"></div>
</a>
<h3>Car ownership</h3>
<p>
We need to know whether the car belongs to you. If you don’t own the car but you’re the registered keeper, you should answer ‘No’
(the owner of the car and the registered keeper can be different people).
</p>
</div>
<span class="editor-label question-layout">
<label for="OwningAndUsingCarPanel_LegalOwner">Are you (or will you be) the legal owner of this car?</label>
</span>
<ul class="question-layout yesno-radio-list">
<li>
<input name="OwningAndUsingCarPanel.LegalOwner" id="OwningAndUsingCarPanel_LegalOwner_true" type="radio" value="True">
<label for="OwningAndUsingCarPanel_LegalOwner_true">
<span>Yes</span>
</label>
</li>
<li>
<input name="OwningAndUsingCarPanel.LegalOwner" id="OwningAndUsingCarPanel_LegalOwner_false" type="radio" value="False">
<label for="OwningAndUsingCarPanel_LegalOwner_false">
<span>No</span>
</label>
</li>
</ul>
<span class="editor-validation">
<span class="field-validation-valid" id="OwningAndUsingCarPanel_LegalOwner_validationMessage"></span>
</span>
</div>
我已经解决了这个问题,方法是创建我自己的名为 getUnknown 的函数来搜索标签内的属性。这应该适用于其值被双引号括起来的任何属性。该函数有 2 个参数,第一个是一个字符串,它应该包含带有属性和值的元素标记,第二个是您要为其提取值的属性。
Private Function getUnknown(myText As String, myAttr As String)
Dim myResult As String = ""
Dim myStart As Integer = 0
Dim myLen As Integer = 0
'remove any spaces around the "=" sign
Dim myCleanText As String = Regex.Replace(myText, "\s+([=])\s+|\s+([=])|([=])\s+", "=")
'add =" to the attribute to avoid finding non-attributes when using IndexOf function
Dim myFullAttr As String = myAttr.Trim().ToLower + "="""
Try
myStart = myCleanText.ToLower().IndexOf(myFullAttr)
If myStart = -1 Then
myResult = "Nothing Found"
Else
myStart = myStart + myFullAttr.Length
myLen = myCleanText.IndexOf("""", myStart) - myStart
myResult = myCleanText.Substring(myStart, myLen)
End If
Catch ex As Exception
myResult = "Nothing Found"
End Try
Return myResult
End Function
在我原来的问题的上下文中,我按如下方式使用了它
Dim myElement As String = _
TryCast(myHTMLDocument, HtmlDocument).GetElementFromPoint(lastMousePos).OuterHtml.ToString
txtID.Text = getUnknown(myElement, "for")