有效地拆分格式为“{ {}, {}, ...}”的字符串

Question

我有一个string格式如下。

string instance = "{112,This is the first day 23/12/2009},{132,This is the second day 24/12/2009}"

private void parsestring(string input)
{
    string[] tokens = input.Split(','); // I thought this would split on the , seperating the {}
    foreach (string item in tokens)     // but that doesn't seem to be what it is doing
    {
       Console.WriteLine(item); 
    }
}

我想要的输出应该如下所示：

112,This is the first day 23/12/2009
132,This is the second day 24/12/2009

但是目前，我得到了下面的：

{112
This is the first day 23/12/2009
{132
This is the second day 24/12/2009

我是 C# 的新手，如有任何帮助，我们将不胜感激。

Answer 1

为此使用 Regex：

string[] tokens = Regex.Split(input, @"}\s*,\s*{")
  .Select(i => i.Replace("{", "").Replace("}", ""))
  .ToArray();

图案说明：

\s* - 匹配零个或多个白色 space 字符

Answer 2

将 using System.Text.RegularExpressions; 添加到 class

的顶部

并使用正则表达式拆分方法

string[] tokens = Regex.Split(input, "(?<=}),");

在这里，我们使用正向前瞻来拆分紧跟在 }

之后的 ,

(注意：(?<= 你的字符串 ) 只匹配你的字符串之后的所有字符。你可以阅读更多关于它的信息 here

Answer 3

如果您不想使用正则表达式，以下代码将生成您需要的输出。

        string instance = "{112,This is the first day 23/12/2009},{132,This is the second day 24/12/2009}";

        string[] tokens = instance.Replace("},{", "}{").Split('}', '{');
        foreach (string item in tokens)
        {
            if (string.IsNullOrWhiteSpace(item)) continue;

            Console.WriteLine(item);
        }

        Console.ReadLine();

Answer 4

好吧，如果您有一个名为 ParseString 的方法，那是一件好事 returns 一些东西（说它是 ParseTokens 可能也不是那么糟糕反而）。所以如果你这样做，你可以得到下面的代码

private static IEnumerable<string> ParseTokens(string input)
{
    return input
        // removes the leading {
        .TrimStart('{')
        // removes the trailing }
        .TrimEnd('}')
        // splits on the different token in the middle
        .Split( new string[] { "},{" }, StringSplitOptions.None );
}

之前它对你不起作用的原因是因为你对 split 方法工作原理的理解是错误的，它会在你的示例中有效地拆分所有 ,。

现在如果你把这些放在一起，你会得到类似这样的东西 dotnetfiddle

using System;
using System.Collections.Generic;

public class Program
{
    private static IEnumerable<string> ParseTokens(string input)
    {
        return input
            // removes the leading {
            .TrimStart('{')
            // removes the trailing }
            .TrimEnd('}')
            // splits on the different token in the middle
            .Split( new string[] { "},{" }, StringSplitOptions.None );
    }

    public static void Main()
    {
        var instance = "{112,This is the first day 23/12/2009},{132,This is the second day 24/12/2009}";
        foreach (var item in ParseTokens( instance ) ) {
            Console.WriteLine( item );
        }
    }
}

Answer 5

不要死守着 Split() 是解决方案！没有它，这是一件很容易解析的事情。正则表达式的答案可能也不错，但我想就原始效率而言 "a parser" 可以解决问题。

IEnumerable<string> Parse(string input)
{
    var results = new List<string>();
    int startIndex = 0;            
    int currentIndex = 0;

    while (currentIndex < input.Length)
    {
        var currentChar = input[currentIndex];
        if (currentChar == '{')
        {
            startIndex = currentIndex + 1;
        }
        else if (currentChar == '}')
        {
            int endIndex = currentIndex - 1;
            int length = endIndex - startIndex + 1;
            results.Add(input.Substring(startIndex, length));
        }

        currentIndex++;
    }

    return results;
}

所以它并不短。它迭代一次，并且每个 "result" 只执行一次分配。稍微调整一下，我可能会制作一个带有索引类型的 C#8 版本来减少分配？这可能已经足够好了。

您可能会花一整天的时间弄清楚如何理解正则表达式，但这其实很简单：

扫描每个字符。
如果找到 {，请注意下一个字符是结果的开始。
如果找到 }，请考虑从最后一个记录的 "start" 到此字符之前的索引为 "a result" 的所有内容。

这不会捕获不匹配的括号，并且可能会为“}}{”等字符串抛出异常。您没有要求处理这些情况，但是改进此逻辑以捕获它并大声疾呼或恢复它并不难。

例如，当找到 } 时，您可以将 startIndex 重置为类似 -1 的值。从那里，如果在 startIndex != -1 时找到 {，您可以推断出您找到了“{{”。如果在 startIndex == -1 时找到 }，则可以推断出找到了“}}”。如果您以 startIndex < -1 退出循环，那将是一个没有结束 } 的开放 {。将字符串 "}whoops" 保留为未覆盖的情况，但可以通过将 startIndex 初始化为 -2 并专门检查它来处理。用正则表达式那个，你会头疼的。

我建议这个的主要原因是你说的"efficiently"。 icepickle 的解决方案很好，但是 Split() 每个标记进行一次分配，然后您为每个 TrimX() 调用执行分配。那不是 "efficient"。那是 "n + 2 allocations".

有效地拆分格式为“{ {}, {}, ...}”的字符串

Efficiently split a string in format "{ {}, {}, ...}"

c#

split

string-parsing