当要查找的字符串具有不同的空格时提取子字符串

Question

我有如下字符串。

传真：666-111-2222 Phone#：200100200

我想查找 phone 号码。但问题是，Phone 之后和# 之后的空格数在不同的字符串中可能会有所不同，以从中提取数据。另外，不建议编写复杂的函数，因为我有一个大型数据集要从中提取数据。

我尝试了下面的代码，它为我提供了带有 n 个空格的正确起始索引。但是我找不到 : from that

之后的位置

System.Globalization.CultureInfo.InvariantCulture.CompareInfo.IndexOf(FullString,"Phone#:",System.Globalization.CompareOptions.IgnoreSymbols)

Answer 1

我假设您需要 C# 答案。

我会使用正则表达式，但如果你坚持使用 IndexOf 你可以这样做：

string fullString = "Fax : 666-111-2222 Phone # : 200100200";
int phonePos = fullString.IndexOf("Phone");
int hashPos = fullString.IndexOf("#", phonePos+"Phone".Length);
int colonPos = fullString.IndexOf(":", hashPos+1);

这样你就有了冒号的绝对位置，不管有多少个空格。请注意，我使用 String.IndexOf。没有理由像您一样将其从 CompareInfo 中挖掘出来。另请注意，我使用的重载带有一个额外的参数，即起始索引。

Answer 2

您在 Phone 和 # 之间有一个 space，也在 # 和 : 之间。带有单个参数的子字符串将 return 从该索引到输入字符串末尾的字符串。 Trim 将删除任何一侧的任何白色space。

Private Function GetPhone(input As String) As String
    Dim i = input.IndexOf("Phone")
    Dim s = input.Substring(i)
    Dim splits = s.Split(":"c)
    Return splits(1).Trim
End Function

我运行函数10,000次，用了5毫秒。

Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
    Dim s = "Fax: 666-111-2222 Phone # : 200100200"
    Dim Phone As String = ""
    Dim sw As New Stopwatch
    sw.Start()
    For i = 0 To 10_000
        Phone = GetPhone(s)
    Next
    sw.Stop()
    Debug.Print(sw.ElapsedMilliseconds.ToString)
    MessageBox.Show(Phone)
End Sub

Answer 3

这显然是正则表达式的工作。

String toMatch = "Fax : 666-111-2222 Phone # : 200100200";
Regex matchPhone = new Regex("\bPhone\s*#\s*:\s*");
MatchCollection matches = matchPhone.Matches(toMatch);
foreach (Match match in matches)
{
    Int32 position = match.Index + match.Length;
    // do whatever you want with the result here
}

在代码中，反斜杠加倍，但实际的正则表达式是：

\bPhone\s*#\s*:\s*

\b表示一个词的边界，意思是一个词的开始或结束。这也可以防止 "MegaPhone" 之类的内容匹配。
\s 表示任何类型的空格。这匹配空格、制表符和换行符。
* 表示零次或多次重复，意思是，如果空格根本不存在，或者有一百个空格，它仍然会匹配。

请注意，这只会为您提供给定字符串中所有找到的 phone 数字的 start 的索引。您没有指定是否有任何特定的方法来检测 phone 号码的 end，或者即使它们有任何特定的预期格式，所以不包括在内.如果您想要这样做，并且您不确切知道这个 phone 数字后面可能是什么，请查看正则表达式字符组和匹配的特定数字内容，并使用捕获组从匹配的内容中提取它。

如果整个字符串中只有一个匹配项，可以用

完成

String toMatch = "Fax : 666-111-2222 Phone # : 200100200";
Regex matchPhone = new Regex("\bPhone\s*#\s*:\s*");
Match match = matchPhone.Match(toMatch);
Int32 position = match.Index + match.Length;

Answer 4

如果您可以依赖格式，那就很简单了。只需清除所有空格 (.Replace(" ", string.Empty)) 的字符串，然后拆分 phone 数字开始后的字符，例如"#:":

var phoneFull = @"Fax : 666-111-2222 Phone # : 200100200";
var phone = phoneFull
    .Replace(" ", string.Empty)
    .Split("#:")
    .Last();

Answer 5

我认为你应该使用正则表达式：

Regex rxPhone = new Regex(@"Phone\s*#\s*:\s*(\d+)");
Match match = rxPhone.Match(stringToMatch);
if (match.Success) //if the phone does not always exits
{
    string strPhoneNumber = match.Groups[1];
    int intPhoneNumber = int.Parse(match.Groups[1]);
    int position = match.Groups[1].Index
    //just pick the one you need
}

当要查找的字符串具有不同的空格时提取子字符串

Extracting a Substring when the string to find has varying whitespaces

c#

vb.net

string

substring

indexof