如何从特定字符串中删除多余的空格?

How to remove extra spaces from a specific string?

我有一个如下所示的字符串:

Ireland, UK, United States of America,     Belgium, Germany   , Some     Country, ...

我需要有关 RegexString.Replace 函数的帮助,以删除多余的空格,以便结果如下:

Ireland,UK,United States of America,Belgium,Germany,Same Country,

谢谢。

您可以通过用逗号分隔输入,然后将多个空格修剪并缩小为 1,然后String.Join返回来实现。

仅展示如何使用 LINQ 完成:

Console.Write(String.Join(",", _
    "Ireland, UK, United States of America,     Belgium, Germany   , Some     Country," _
     .Split(","c) _
     .Select(Function(m) Regex.Replace(m.Trim(), "\p{Zs}{2,}", " ")) _
     .ToArray()))

关键是Regex.Replace(m.Trim(), "\p{Zs}{2,}", " "),其中多个空格缩减为1。

结果:Ireland,UK,United States of America,Belgium,Germany,Some Country,

虽然 stribizhev 写的答案适合这种情况,但我想借此机会强调使用正则表达式执行简单任务对性能的(负面)影响。

替代方案明显比正则表达式快 (x2)(处理这些情况时正则表达式总是很慢)

我的方法基于递归删除空格。我创建了两个版本:第一个使用常规循环(withoutRegex),第二个使用 LINQ(withoutRegex2;实际上,它与 stribizhev 的答案相同,除了 Regex部分)。

Private Function withoutRegex(input As String) As String

    Dim output As String = ""

    Dim temp() = input.Split(","c)
    For i As Integer = 0 To temp.Length - 1
        output = output & recursiveSpaceRemoval(temp(i).Trim()) & If(i < temp.Length - 1, ",", "")
    Next

    Return output

End Function

Private Function withoutRegex2(input As String) As String

    Return String.Join(",", _
    input _
    .Split(","c) _
    .Select(Function(x) recursiveSpaceRemoval(x.Trim())) _
    .ToArray())

End Function

Private Function recursiveSpaceRemoval(input As String) As String

    Dim output As String = input.Replace("  ", " ")

    If output = input Then Return output
    Return recursiveSpaceRemoval(output)

End Function

为了证明我的观点,我创建了以下测试框架:

Dim input As String = "Ireland, UK, United States of America,     Belgium, Germany   , Some     Country"
Dim output As String = ""

Dim count As Integer = 0
Dim countMax As Integer = 20
Dim with0 As Long = 0
Dim without As Long = 0
Dim without2 As Long = 0

While count < countMax

    count = count + 1
    Dim sw As Stopwatch = New Stopwatch
    sw.Start()
    output = withRegex(input)
    sw.Stop()
    with0 = with0 + sw.ElapsedTicks

    sw = New Stopwatch
    sw.Start()
    output = withoutRegex(input)
    sw.Stop()
    without = without + sw.ElapsedTicks

    sw = New Stopwatch
    sw.Start()
    output = withoutRegex2(input)
    sw.Stop()
    without2 = without2 + sw.ElapsedTicks

End While

MessageBox.Show("With: " & with0.ToString)
MessageBox.Show("Without: " & without.ToString)
MessageBox.Show("Without 2: " & without2.ToString)

其中withRegex指的是stribizhev的回答,即:

Private Function withRegex(input As String) As String

    Return String.Join(",", _
    input _
    .Split(","c) _
    .Select(Function(m) Regex.Replace(m.Trim(), "\p{Zs}{2,}", " ")) _
    .ToArray())

End Function

这是一个简单的测试框架,它分析非常快速的动作,其中每一位都很重要(20 次循环迭代的要点正是试图提高测量的可靠性)。也就是说:即使改变调用方法的顺序也会影响结果。

无论如何,在我的所有测试中,方法之间的差异或多或少保持一致。经过一些测试后我得到的平均值是:

With: 2500-2700
Without: 1100-1300
Without2: 900-1200

注意:至于这是对正则表达式性能的一般批评(至少,在足够简单的情况下,可以很容易地用我在这里展示的内容的替代方案替换),关于如何改进的任何建议它(正则表达式的性能)在 .NET 中将非常受欢迎。但请避免笼统的不清楚的陈述,并尽可能具体(例如,通过建议对提议的测试框架进行更改)。