C# 控制台应用程序：捕获一些空格

Question

我有一个解析 .txt 文件的 C# 控制台应用程序。 txt 文件每行有 4 个值。所以这里有几个示例：

c:\ecpg\myfolder\no_space.cfm           20160803   01:09:54   1574

c:\ecpg\myfolder\file with space.cfm           20160803   01:09:54   1574

c:\myfolder\.project                                             20170221   07:54:10   265

我正在使用以下内容根据每行中的白色 spaces 进行拆分：

while ((line = file.ReadLine()) != null)
 {
    string[] parts = line.Split(new char[0], StringSplitOptions.RemoveEmptyEntries);
 }

问题是，在第 2 行的情况下，文件名中有一个 space，因此解析失败，因为现在我有 5 个值而不是 4 个。我该如何防止这种情况发生？也许有某种方法可以检测是否有 . （点）紧随 space?

谢谢！

Answer 1

改为按期间拆分。这将为您提供两个单独的字符串：文件和其余字符串。仅拆分 space 上的第二个字符串。第二个字符串拆分的第一个元素是您的文件扩展名：

while ((line = file.ReadLine()) != null)
 {

    string[] parts = line.Split('.');

    string[] secondSplit = parts[1].Split(' ');

    // put together the file path
    string filePath = parts[0] + "." +  secondSplit[0];

    // Do something here with the rest of the second split: secondSplit
 }

Answer 2

您可以使用 Regex 来 split 您的 string，它会给您更好的输出。请检查我的代码：

while ((line = file.ReadLine()) != null)
{
    string[] parts = Regex.Split(line, @"(\s+\s+)");
}

我也写在了DotNetFiddle你可以看看这个

编辑：我已经编辑了代码，它将涵盖您的所有场景。 New Solution Fiddle

while ((line = file.ReadLine()) != null)
{
    string partOne = Regex.Match(line, @"[a-z](.*)[a-z]").Value;
    //string[] parts = Regex.Split(line.Replace(partOne, ""), @"(\s+)");
    string[] parts;
    if (!string.IsNullOrEmpty(partOne))
    {
        parts = Regex.Split(line.Replace(partOne, ""), @"(\s+)");
    }
    else
    {
        parts = Regex.Split(line, @"(\s+)");
    }
}

最终代码：

List<string> parts = new List<string>();
while ((line = file.ReadLine()) != null)
{
    parts = new List<string>();
    //string partOne = Regex.Match(line, @"[A-Za-z](.*)[A-Za-z]").Value;
    //Update Regex for handle numeric value in part one.
    string partOne = Regex.Match(line, @"[A-Za-z](.*)([A-Za-z]|([A-Za-z]{1}[0-9]))(.*?)\s").Value.Trim();
    parts.Add(partOne);
    string[] fianlParts;
    if (!string.IsNullOrEmpty(partOne))
    {
        fianlParts = Regex.Split(line.Replace(partOne, ""), @"(\s+)");
    }
    else
    {
        fianlParts = Regex.Split(line, @"(\s+)");
    }

    foreach (string part in fianlParts)
    {
        if (!string.IsNullOrEmpty(part.Trim()))
        {
            parts.Add(part);
        }
    }

    Console.WriteLine(parts[0] + " " + parts[1] + " " + parts[2] + " " + parts[3]);
}

Answer 3

此方法是手动的，但有效。它支持具有任意数量空格的文件名。它的工作原理是从字符串的末尾定位空格，在循环中检索三个字段，最后检索文件名。如果您正在解析大文件，这里有足够的优化空间。

while ((line = file.ReadLine()) != null)
{
    string[] parts = new string[4];

    int n = -1;
    for (int idx = 0; idx < 3; idx++)
    {
        n = line.LastIndexOf(' ');
        parts[3-idx] = line.Substring(n + 1);
        line = line.Substring(0, n).TrimEnd();
    }

    parts[0] = line; // filename
}

如果缺少一个或多个字段，您可以进行简单的模式检查。在您的文件中，第一个参数是文件名，第二个参数是 8 位日期，第三个是一天中的时间，第四个（可能）是文件大小。在这种情况下，这段代码应该更健壮（我没有尝试编译它，所以它可能包含拼写错误）：

while ((line = file.ReadLine()) != null)
{
    string[] parts = new string[4];

    int n = -1;
    for (int idx = 0; idx < 3; idx++)
    {
        n = line.LastIndexOf(' ');
        if (n == -1 || n == 0) break;
        string part = line.Substring(n + 1);
        if (part.IndexOf(':') > 0) parts[2] = part;
        else if (part.Length == 8) parts[1] = part;
        else parts[3] = part; // assuming you don't have 8-digit filesizes
        line = line.Substring(0, n).TrimEnd();
    }

    parts[0] = line.TrimEnd(); // filename
}

C# 控制台应用程序：捕获一些空格

C# Console Application: Catching Some Cases of White Spaces

c#

string

split

space