将大文本文件拆分为较小的文本文件
splitting a large text file into smaller text files
我正在尝试根据包含大约 600 万行的行数拆分文本文件,并且每个文件应始终以特定标识符结尾(最后一行)。
我尝试了什么:
using (System.IO.StreamReader sr = new System.IO.StreamReader(inputfile))
{
int fileNumber = 0;
string line = "";
while (!sr.EndOfStream)
{
int count = 0;
//identifier = sr.ReadLine().Substring(0,2);
using (System.IO.StreamWriter sw = new System.IO.StreamWriter(inputfile + ++fileNumber + ".TXT"))
{
sw.AutoFlush = true;
while (!sr.EndOfStream && ++count < 1233123)
{
line = sr.ReadLine();
sw.WriteLine(line);
}
//having problems starting here not sure how to implement the other condition == "JK"
line = sr.ReadLine();
if (count > 1233123 && line.Substring(0,2) == "JK")
{
sw.WriteLine(line);
}
else
{
while (!sr.EndOfStream && line.Substring(0,2) != "JK")
{
line = sr.ReadLine();
sw.WriteLine(line);
}
}
}
}
}
示例输入文本如下:
AAadsadasdasdasdfsdfsdfs
Bbasfafasfasdfdsfsdfsdff
CCsafsdfasdadfasdfasfsaf
DDasdsfsdfsafdsadfsafasf
JKdfgdsgdsfgsdfgsfgdfgdf
AAfsdfsadfsdfsaadfadasda
BBadfasdfasdfdsfasfasdas
CCadasdsfasdfasfasfasfds
DDsdfsdafasdfsdfdsfsdfsd
EEsadfsfsasafasdfsdfsdfs
FFasfasfadsdfdsadssfsdfs
JKadsadasdasdadsadasdasa
AAadasdasdasdasdasdasdas
BBasdadadadasdasdasdasdd
CCadasdasdasdasdasdasdad
JKsafsdfsdfasfasdfdasfsa
基本上我想要实现的是有多个至少有 1233123 行或更多的文本文件(即如果第 1233123 行没有“JK”然后继续写入当前文件直到找到它)。
在读取和写入文件时检查您的条件,大于 1233123
的行号和以 JK
开头的行是否为真。在这种情况下,您可以停止写入文件片段并继续最外层循环的下一次迭代,该循环开始写入下一个文件。
using (System.IO.StreamWriter sw = new System.IO.StreamWriter(inputfile + ++fileNumber + ".TXT"))
{
sw.AutoFlush = true;
while (!sr.EndOfStream)
{
line = sr.ReadLine();
sw.WriteLine(line);
if(++count > 1233123 && line.Substring(0,2) == "JK")
{
break;
}
}
}
我正在尝试根据包含大约 600 万行的行数拆分文本文件,并且每个文件应始终以特定标识符结尾(最后一行)。 我尝试了什么:
using (System.IO.StreamReader sr = new System.IO.StreamReader(inputfile))
{
int fileNumber = 0;
string line = "";
while (!sr.EndOfStream)
{
int count = 0;
//identifier = sr.ReadLine().Substring(0,2);
using (System.IO.StreamWriter sw = new System.IO.StreamWriter(inputfile + ++fileNumber + ".TXT"))
{
sw.AutoFlush = true;
while (!sr.EndOfStream && ++count < 1233123)
{
line = sr.ReadLine();
sw.WriteLine(line);
}
//having problems starting here not sure how to implement the other condition == "JK"
line = sr.ReadLine();
if (count > 1233123 && line.Substring(0,2) == "JK")
{
sw.WriteLine(line);
}
else
{
while (!sr.EndOfStream && line.Substring(0,2) != "JK")
{
line = sr.ReadLine();
sw.WriteLine(line);
}
}
}
}
}
示例输入文本如下:
AAadsadasdasdasdfsdfsdfs Bbasfafasfasdfdsfsdfsdff CCsafsdfasdadfasdfasfsaf DDasdsfsdfsafdsadfsafasf JKdfgdsgdsfgsdfgsfgdfgdf AAfsdfsadfsdfsaadfadasda BBadfasdfasdfdsfasfasdas CCadasdsfasdfasfasfasfds DDsdfsdafasdfsdfdsfsdfsd EEsadfsfsasafasdfsdfsdfs FFasfasfadsdfdsadssfsdfs JKadsadasdasdadsadasdasa AAadasdasdasdasdasdasdas BBasdadadadasdasdasdasdd CCadasdasdasdasdasdasdad JKsafsdfsdfasfasdfdasfsa
基本上我想要实现的是有多个至少有 1233123 行或更多的文本文件(即如果第 1233123 行没有“JK”然后继续写入当前文件直到找到它)。
在读取和写入文件时检查您的条件,大于 1233123
的行号和以 JK
开头的行是否为真。在这种情况下,您可以停止写入文件片段并继续最外层循环的下一次迭代,该循环开始写入下一个文件。
using (System.IO.StreamWriter sw = new System.IO.StreamWriter(inputfile + ++fileNumber + ".TXT"))
{
sw.AutoFlush = true;
while (!sr.EndOfStream)
{
line = sr.ReadLine();
sw.WriteLine(line);
if(++count > 1233123 && line.Substring(0,2) == "JK")
{
break;
}
}
}