如何通过 vb.net 提取字符串中的多个单词
How extract multi words in string by vb.net
<tr class="sh"onclick="ii.ShowShareHolder('7358,IRO1GNBO0008')">
<td>ghanisha sherkat-</td>
<td><div class='ltr' title="141,933,691">142 M</div></td>
<td>52.560</td>
<td>0</td>
<td><div class=""/></td>
</tr>
我们要在以上文本项下输出:
ghanisha sherkat
141,933,691
52.560
0
我的尝试:
Dim input as string="above text"
Dim c2 As String() = input.Split(New String() {"</td>"},StringSplitOptions.None)
Dim r As Integer
For r = 0 To c2.Length - 2
MessageBox.Show(c2(r))
Next
其他我的尝试
Dim sDelimStart As String = "<td>"
Dim sDelimEnd As String = "</td>"
Dim nIndexStart As Integer = input.IndexOf(sDelimStart)
Dim nIndexEnd As Integer = input.IndexOf(sDelimEnd)
Res = Strings.Mid(input, nIndexStart + sDelimStart.Length + 1, nIndexEnd - nIndexStart - sDelimStart.Length)
MessageBox.Show(res)
通过这种方式提取“ghanisha sherkat”
如何提取其他项目?
现在怎么继续呢?谢谢
不适合直接搜索字符串或使用正则表达式解析标记代码,如HTML。您可以改为使用 XmlDocument
并使用 <?xml...?>
标记将 HTML 解析为 XML。
1。使用 XmlDocument
Dim input As String = <?xml version="1.0" encoding="utf-8"?>
<tr class="sh" onclick="ii.ShowShareHolder('7358,IRO1GNBO0008')">
<td>ghanisha sherkat</td>
<td><div class="ltr" title="141,933,691">142 M</div></td>
<td>52.560</td>
<td>0</td>
<td><div class=""/></td>
</tr>.ToString()
Dim doc As New XmlDocument
doc.LoadXml(input)
' Fetch all TDs inside TRs using XPath
Dim tds = doc.SelectNodes("/tr/td")
For Each item As XmlNode In tds
' If element id a DIV
If item.FirstChild.Name = "div" Then
' Get the title attribute
Dim titleAttr = item.FirstChild.Attributes("title")
If Not titleAttr Is Nothing Then
Console.WriteLine(titleAttr.Value)
End If
End If
Console.WriteLine(item.InnerText())
Next
您可以在此处 .NET Fiddle 查看使用 XmlDocument
的工作示例。
2。使用正则表达式(不推荐)
Dim input As String =
<tr class="sh" onclick="ii.ShowShareHolder('7358,IRO1GNBO0008')">
<td>ghanisha sherkat</td>
<td><div class="ltr" title="141,933,691">142 M</div></td>
<td>52.560</td>
<td>0</td>
<td><div class=""/></td>
</tr>.ToString()
' Extract all TDs
Dim tds = Regex.Matches(input, "<td[^>]*>\s*(.*?)\s*<\/td>")
For Each td In tds
Dim content = td.Groups(1).ToString()
' Check if element id DIV
If Regex.IsMatch(content, "^<div") Then
' Check if DIV has a title attribute
Dim title = Regex.Match(content, "<div.*title=""(.*?)"">(.*)<\/div>")
If title.Length > 0 Then
' Print the first group for title 141,933,691
Console.WriteLine(title.Groups(1))
' Print the second group for element content 142 M
Console.WriteLine(title.Groups(2))
End If
Else
Console.WriteLine(td.Groups(1))
End If
Next
您可以在此 .NET Fiddle.
查看使用正则表达式的工作示例
结果
两个例子的结果是一样的:
ghanisha sherkat
141,933,691
142 M
52.560
0
正则表达式可以使这项工作更容易。
我假设您要捕获的值是直接在 td 标记内或 td 内的 title 属性中的值。
因此,您可以使用以下正则表达式来捕获所有以 td 标签开头的值,内部可能有或没有带有 title 属性的标签,然后在值后有或没有结束引号,并以结束结束td 标签。
<td>(?:<.*title="")?([a-zA-Z0-9 \.,]*)""?.*\-?</td>
这是 VB.Net 代码:
Imports System.Text.RegularExpressions
Public Class Form1
Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load
Dim input As String = "above text"
Dim matches As MatchCollection = Regex.Matches(input, "<td>(?:<.*title="")?([a-zA-Z0-9 \.,]*)""?.*\-?</td>")
' Loop over matches.
For Each m As Match In matches
MessageBox.Show(m.Groups(1).Value)
Next
End Sub
End Class
<tr class="sh"onclick="ii.ShowShareHolder('7358,IRO1GNBO0008')">
<td>ghanisha sherkat-</td>
<td><div class='ltr' title="141,933,691">142 M</div></td>
<td>52.560</td>
<td>0</td>
<td><div class=""/></td>
</tr>
我们要在以上文本项下输出:
ghanisha sherkat
141,933,691
52.560
0
我的尝试:
Dim input as string="above text"
Dim c2 As String() = input.Split(New String() {"</td>"},StringSplitOptions.None)
Dim r As Integer
For r = 0 To c2.Length - 2
MessageBox.Show(c2(r))
Next
其他我的尝试
Dim sDelimStart As String = "<td>"
Dim sDelimEnd As String = "</td>"
Dim nIndexStart As Integer = input.IndexOf(sDelimStart)
Dim nIndexEnd As Integer = input.IndexOf(sDelimEnd)
Res = Strings.Mid(input, nIndexStart + sDelimStart.Length + 1, nIndexEnd - nIndexStart - sDelimStart.Length)
MessageBox.Show(res)
通过这种方式提取“ghanisha sherkat”
如何提取其他项目?
现在怎么继续呢?谢谢
不适合直接搜索字符串或使用正则表达式解析标记代码,如HTML。您可以改为使用 XmlDocument
并使用 <?xml...?>
标记将 HTML 解析为 XML。
1。使用 XmlDocument
Dim input As String = <?xml version="1.0" encoding="utf-8"?>
<tr class="sh" onclick="ii.ShowShareHolder('7358,IRO1GNBO0008')">
<td>ghanisha sherkat</td>
<td><div class="ltr" title="141,933,691">142 M</div></td>
<td>52.560</td>
<td>0</td>
<td><div class=""/></td>
</tr>.ToString()
Dim doc As New XmlDocument
doc.LoadXml(input)
' Fetch all TDs inside TRs using XPath
Dim tds = doc.SelectNodes("/tr/td")
For Each item As XmlNode In tds
' If element id a DIV
If item.FirstChild.Name = "div" Then
' Get the title attribute
Dim titleAttr = item.FirstChild.Attributes("title")
If Not titleAttr Is Nothing Then
Console.WriteLine(titleAttr.Value)
End If
End If
Console.WriteLine(item.InnerText())
Next
您可以在此处 .NET Fiddle 查看使用 XmlDocument
的工作示例。
2。使用正则表达式(不推荐)
Dim input As String =
<tr class="sh" onclick="ii.ShowShareHolder('7358,IRO1GNBO0008')">
<td>ghanisha sherkat</td>
<td><div class="ltr" title="141,933,691">142 M</div></td>
<td>52.560</td>
<td>0</td>
<td><div class=""/></td>
</tr>.ToString()
' Extract all TDs
Dim tds = Regex.Matches(input, "<td[^>]*>\s*(.*?)\s*<\/td>")
For Each td In tds
Dim content = td.Groups(1).ToString()
' Check if element id DIV
If Regex.IsMatch(content, "^<div") Then
' Check if DIV has a title attribute
Dim title = Regex.Match(content, "<div.*title=""(.*?)"">(.*)<\/div>")
If title.Length > 0 Then
' Print the first group for title 141,933,691
Console.WriteLine(title.Groups(1))
' Print the second group for element content 142 M
Console.WriteLine(title.Groups(2))
End If
Else
Console.WriteLine(td.Groups(1))
End If
Next
您可以在此 .NET Fiddle.
查看使用正则表达式的工作示例结果
两个例子的结果是一样的:
ghanisha sherkat
141,933,691
142 M
52.560
0
正则表达式可以使这项工作更容易。
我假设您要捕获的值是直接在 td 标记内或 td 内的 title 属性中的值。
因此,您可以使用以下正则表达式来捕获所有以 td 标签开头的值,内部可能有或没有带有 title 属性的标签,然后在值后有或没有结束引号,并以结束结束td 标签。
<td>(?:<.*title="")?([a-zA-Z0-9 \.,]*)""?.*\-?</td>
这是 VB.Net 代码:
Imports System.Text.RegularExpressions
Public Class Form1
Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load
Dim input As String = "above text"
Dim matches As MatchCollection = Regex.Matches(input, "<td>(?:<.*title="")?([a-zA-Z0-9 \.,]*)""?.*\-?</td>")
' Loop over matches.
For Each m As Match In matches
MessageBox.Show(m.Groups(1).Value)
Next
End Sub
End Class