如何逐块处理此文本？

Question

我要逐块单独处理数据

正文如下：

[全球]
asd
dsa
阿克尔
ASd

[测试 2]
bnmnb
hkhjk
茨祖兹
Tzutzi
齐齐

[测试 3]
5675
46546
464
564
56456
45645654
4565464

[其他]
sdfsd
dsf
自卫队
dsfs

首先我想要第一个块并处理它而不是第二个......等等......

private void textprocessing(string filename)
{
    using (StreamReader sr1 = new StreamReader(filename))
    {
        string linetemp = "";
        bool found = false;
        int index = 0;

        while ((linetemp=sr1.ReadLine())!=null)
        {
            if (found==true)
            {
                MessageBox.Show(linetemp);
                break;   
            }

            if (linetemp.Contains("["))
            {
                found = true;
            }
            else
            {
                found = false;
            }                                                             
        }                                    
    }          
}

Answer 1

您可以使用 string.Split() 根据“[”拆分字符串，然后根据换行符拆分字符串。然后你检查是否存在“]”

void Main()
{
    string txt = @"[Global]
asd
dsa
akl
ASd

[Test2]
bnmnb
hkhjk
tzutzi
Tzutzi
Tzitzi

[Test3]
5675
46546
464
564
56456
45645654
4565464

[other]
sdfsd
dsf
sdf
dsfs";

    string[] split = txt.Split('[');
    foreach(var s in split)
    {
        var subsplits = s.Split(new string[] { Environment.NewLine }, StringSplitOptions.None);
        Console.WriteLine(subsplits[0]);
        foreach(var ss in subsplits)
        {
            if(!ss.Contains("]"))
                Console.WriteLine(ss);
        }
    }
}

这输出

asd
dsa
akl
ASd


bnmnb
hkhjk
tzutzi
Tzutzi
Tzitzi


5675
46546
464
564
56456
45645654
4565464


sdfsd
dsf
sdf
dsfs

您可以添加一个附加检查来检查它是否为空行并忽略它。

Answer 2

这是一种方法：

private void ReadFile()
{
    //load all  lines
    var lines = File.ReadAllLines(@"c:\temp\file.txt").ToList().;
    //remove empty lines
    lines = lines.Where(l => l.Trim().Length > 0).ToList();
    //mark indexes where sections start
    var sectionIndexes = lines
        .Where(l => l.StartsWith("[") && l.EndsWith("]"))
        .Select(l => lines.IndexOf(l)).ToList();

    //now make list of tuples. Each tuple contains start of section (Item1) and last line of section (Item2)
    var sections = Enumerable.Zip(sectionIndexes, sectionIndexes.Skip(1), (a, b) => new Tuple<int, int>(a, b-1)).ToList();

    //for each tuple (each section)
    foreach (var item in sections)
    {
        //process section name (line with raound brackets
        ProcessSection(lines[item.Item1], lines.Where(l => lines.IndexOf(l) > item.Item1 && lines.IndexOf(l) <= item.Item2));
    }
}

private void ProcessSection(string sectionName, IEnumerable<string> lines)
{
    Console.WriteLine("this is section {0} with following lines: {1}", sectionName, string.Join(", ", lines.ToArray()));
}

ProcessSection 方法的输出为：

this is section [Global] with following lines: asd, dsa, akl, ASd
this is section [Test2] with following lines: bnmnb, hkhjk, tzutzi, Tzutzi, Tzitzi
this is section [Test3] with following lines: 5675, 46546, 464, 564, 56456, 45645654, 4565464

这是一个非常快速和肮脏的解决方案，但如果您正在阅读的文件很小，它就足够了。

如果您还有其他问题，请随时提出。

如何逐块处理此文本？

How to process this text block by block?

c#

block

text-files

sequencefile