如何通过 vb.net 提取字符串中的多个单词

How extract multi words in string by vb.net

<tr class="sh"onclick="ii.ShowShareHolder('7358,IRO1GNBO0008')">
    <td>ghanisha sherkat-</td>
    <td><div class='ltr' title="141,933,691">142 M</div></td>
    <td>52.560</td>
    <td>0</td>
    <td><div class=""/></td>
</tr>

我们要在以上文本项下输出:

ghanisha sherkat
141,933,691
52.560
0

我的尝试:

Dim input as string="above text"

Dim c2 As String() = input.Split(New String() {"</td>"},StringSplitOptions.None)

Dim r As Integer

For r = 0 To c2.Length - 2
    MessageBox.Show(c2(r))
Next

其他我的尝试

   Dim sDelimStart As String = "<td>" 
                    Dim sDelimEnd As String = "</td>" 
                    Dim nIndexStart As Integer = input.IndexOf(sDelimStart) 
                    Dim nIndexEnd As Integer = input.IndexOf(sDelimEnd) 
                    
                         Res = Strings.Mid(input, nIndexStart + sDelimStart.Length + 1, nIndexEnd - nIndexStart - sDelimStart.Length)

                        MessageBox.Show(res) 
                    

通过这种方式提取“ghanisha sherkat”

如何提取其他项目?

现在怎么继续呢?谢谢

不适合直接搜索字符串或使用正则表达式解析标记代码,如HTML。您可以改为使用 XmlDocument 并使用 <?xml...?> 标记将 HTML 解析为 XML。

1。使用 XmlDocument

Dim input As String = <?xml version="1.0" encoding="utf-8"?>
<tr class="sh" onclick="ii.ShowShareHolder('7358,IRO1GNBO0008')">
    <td>ghanisha sherkat</td>
    <td><div class="ltr" title="141,933,691">142 M</div></td>
    <td>52.560</td>
    <td>0</td>
    <td><div class=""/></td>
</tr>.ToString()

Dim doc As New XmlDocument
doc.LoadXml(input)

' Fetch all TDs inside TRs using XPath
Dim tds = doc.SelectNodes("/tr/td")

For Each item As XmlNode In tds
    ' If element id a DIV
    If item.FirstChild.Name = "div" Then
        ' Get the title attribute
        Dim titleAttr = item.FirstChild.Attributes("title")
        If Not titleAttr Is Nothing Then
            Console.WriteLine(titleAttr.Value)
        End If
    End If
    Console.WriteLine(item.InnerText())
Next

您可以在此处 .NET Fiddle 查看使用 XmlDocument 的工作示例。

2。使用正则表达式(不推荐

Dim input As String =
<tr class="sh" onclick="ii.ShowShareHolder('7358,IRO1GNBO0008')">
    <td>ghanisha sherkat</td>
    <td><div class="ltr" title="141,933,691">142 M</div></td>
    <td>52.560</td>
    <td>0</td>
    <td><div class=""/></td>
</tr>.ToString()

' Extract all TDs
Dim tds = Regex.Matches(input, "<td[^>]*>\s*(.*?)\s*<\/td>")

For Each td In tds
    Dim content = td.Groups(1).ToString()
    ' Check if element id DIV
    If Regex.IsMatch(content, "^<div") Then
        ' Check if DIV has a title attribute
        Dim title = Regex.Match(content, "<div.*title=""(.*?)"">(.*)<\/div>")
        If title.Length > 0 Then
            ' Print the first group for title 141,933,691
            Console.WriteLine(title.Groups(1))
            ' Print the second group for element content 142 M
            Console.WriteLine(title.Groups(2))
        End If
    Else
        Console.WriteLine(td.Groups(1))
    End If
Next

您可以在此 .NET Fiddle.

查看使用正则表达式的工作示例

结果

两个例子的结果是一样的:

ghanisha sherkat
141,933,691
142 M
52.560
0

正则表达式可以使这项工作更容易。

我假设您要捕获的值是直接在 td 标记内或 td 内的 title 属性中的值。

因此,您可以使用以下正则表达式来捕获所有以 td 标签开头的值,内部可能有或没有带有 title 属性的标签,然后在值后有或没有结束引号,并以结束结束td 标签。

    <td>(?:<.*title="")?([a-zA-Z0-9 \.,]*)""?.*\-?</td>

这是 VB.Net 代码:

    Imports System.Text.RegularExpressions

    Public Class Form1

        Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load
            Dim input As String = "above text"

            Dim matches As MatchCollection = Regex.Matches(input, "<td>(?:<.*title="")?([a-zA-Z0-9 \.,]*)""?.*\-?</td>")

            ' Loop over matches.
            For Each m As Match In matches
                MessageBox.Show(m.Groups(1).Value)
            Next
        End Sub
    End Class