我如何判断 StreamReader.Readline() 末尾是否有 environment.newline
How can I tell if there is an environment.newline at the end of StreamReader.Readline()
我试图逐行读取一个文本文件,并从多行中创建一行,直到读入的行末尾有 \r\n。我的数据如下所示:
BusID|Comment1|Text\r\n
1010|"Cuautla, Inc. d/b/a 3 Margaritas VIII\n
State Lic. #40428210000 City Lic.#4042821P\n
9/26/14 9/14/14 - 9/13/15 5.00\n
9/20/00 9/14/00 - 9/13/01 5.00 New License"\r\n
1020|"7-Eleven Inc., dba 7-Eleven Store #20638\n
State Lic. #24111110126; City Lic. #2411111126P\n
SEND ISSUED LICENSES TO DALLAS, TX\r\n
我希望数据看起来像这样:
BusID|Comment1|Text\r\n
1010|"Cuautla, Inc. d/b/a 3 Margaritas VIII State Lic. #40428210000 City Lic.#4042821P 9/26/14 9/14/14 - 9/13/15 5.00 9/20/00 9/14/00 - 9/13/01 5.00 New License"\r\n
1020|"7-Eleven Inc., dba 7-Eleven Store #20638 State Lic. #24111110126; City Lic. #2411111126P SEND ISSUED LICENSES TO DALLAS, TX\r\n
我的代码是这样的:
FileStream fsFileStream = new FileStream(strInputFileName, FileMode.Open,
FileAccess.Read, FileShare.ReadWrite);
using (StreamReader srStreamRdr = new StreamReader(fsFileStream))
{
while ((strDataLine = srStreamRdr.ReadLine()) != null && !blnEndOfFile)
{
//code evaluation here
}
我试过:
if (strDataLine.EndsWith(Environment.NewLine))
{
blnEndOfLine = true;
}
和
if (strDataLine.Contains(Environment.NewLine))
{
blnEndOfLine = true;
}
这些在字符串变量的末尾看不到任何东西。有没有办法告诉我真正的行尾,以便我可以将这些行组合成一行?我应该以不同的方式阅读文件吗?
如果您发布的内容与文件中的内容完全一致。意思是 \r\n 确实写好了,你可以使用下面的命令来取消转义:
strDataLine.Replace("\r", "\r").Replace("\n", "\n");
这将确保您现在可以使用 Environment.NewLine
进行比较,如:
if (strDataLine.Replace("\r", "\r").Replace("\n", "\n").EndsWith(Environment.NewLine))
{
blnEndOfLine = true;
}
您不能使用StringReader 的ReadLine 方法,因为每种换行符。 \r\n
和 \n
都从输入中删除,reader 返回一行,您永远不会知道删除的字符是 \r\n 还是只是 \n
如果文件不是很大,那么您可以尝试将所有内容加载到内存中,然后将自己拆分成单独的行
// Load everything in memory
string fileData = File.ReadAllText(@"D:\temp\myData.txt");
// Split on the \r\n (I don't use Environment.NewLine because it
// respects the OS conventions and this could be wrong in this context
string[] lines = fileData.Split(new string[] { "\r\n"}, StringSplitOptions.RemoveEmptyEntries);
// Now replace the remaining \n with a space
lines = lines.Select(x => x.Replace("\n", " ")).ToArray();
foreach(string s in lines)
Console.WriteLine(s);
编辑
如果你的文件真的很大(比如你说的 3.5GB),那么你无法将所有内容加载到内存中,但你需要分块处理它。幸运的是,StreamReader 提供了一个名为 ReadBlock 的方法,允许我们实现这样的代码
// Where we store the lines loaded from file
List<string> lines = new List<string>();
// Read a block of 10MB
char[] buffer = new char[1024 * 1024 * 10];
bool lastBlock = false;
string leftOver = string.Empty;
// Start the streamreader
using (StreamReader reader = new StreamReader(@"D:\temp\localtext.txt"))
{
// We exit when the last block is reached
while (!lastBlock)
{
// Read 10MB
int loaded = reader.ReadBlock(buffer, 0, buffer.Length);
// Exit if we have no more blocks to read (EOF)
if(loaded == 0) break;
// if we get less bytes than the block size then
// we are on the last block
lastBlock = (loaded != buffer.Length);
// Create the string from the buffer
string temp = new string(buffer, 0, loaded);
// prepare the working string adding the remainder from the
// previous loop
string current = leftOver + temp;
// Search the last \r\n
int lastNewLinePos = temp.LastIndexOf("\r\n");
if (lastNewLinePos > -1)
{
// Prepare the working string
current = leftOver + temp.Substring(0, lastNewLinePos + 2);
// Save the incomplete parts for the next loop
leftOver = temp.Substring(lastNewLinePos + 2);
}
// Process the lines
AddLines(current, lines);
}
}
void AddLines(string current, List<string> lines)
{
var splitted = current.Split(new string[] { "\r\n" }, StringSplitOptions.RemoveEmptyEntries);
lines.AddRange(splitted.Select(x => x.Replace("\n", " ")).ToList());
}
此代码假定您的文件始终以 \r\n 结尾,并且您始终在 10MB 的文本块中获得 \r\n。需要对您的实际数据进行更多测试。
您可以通过调用 File.ReadAllText(path)
读取所有文本并按以下方式解析它:
string input = File.ReadAllText(your_file_path);
string output = string.Empty;
input.Split(new[] { Environment.NewLine } , StringSplitOptions.RemoveEmptyEntries).
Skip(1).ToList().
ForEach(x =>
{
output += x.EndsWith("\r\n") ? x + Environment.NewLine
: x.Replace("\n"," ");
});
我试图逐行读取一个文本文件,并从多行中创建一行,直到读入的行末尾有 \r\n。我的数据如下所示:
BusID|Comment1|Text\r\n
1010|"Cuautla, Inc. d/b/a 3 Margaritas VIII\n
State Lic. #40428210000 City Lic.#4042821P\n
9/26/14 9/14/14 - 9/13/15 5.00\n
9/20/00 9/14/00 - 9/13/01 5.00 New License"\r\n
1020|"7-Eleven Inc., dba 7-Eleven Store #20638\n
State Lic. #24111110126; City Lic. #2411111126P\n
SEND ISSUED LICENSES TO DALLAS, TX\r\n
我希望数据看起来像这样:
BusID|Comment1|Text\r\n
1010|"Cuautla, Inc. d/b/a 3 Margaritas VIII State Lic. #40428210000 City Lic.#4042821P 9/26/14 9/14/14 - 9/13/15 5.00 9/20/00 9/14/00 - 9/13/01 5.00 New License"\r\n
1020|"7-Eleven Inc., dba 7-Eleven Store #20638 State Lic. #24111110126; City Lic. #2411111126P SEND ISSUED LICENSES TO DALLAS, TX\r\n
我的代码是这样的:
FileStream fsFileStream = new FileStream(strInputFileName, FileMode.Open,
FileAccess.Read, FileShare.ReadWrite);
using (StreamReader srStreamRdr = new StreamReader(fsFileStream))
{
while ((strDataLine = srStreamRdr.ReadLine()) != null && !blnEndOfFile)
{
//code evaluation here
}
我试过:
if (strDataLine.EndsWith(Environment.NewLine))
{
blnEndOfLine = true;
}
和
if (strDataLine.Contains(Environment.NewLine))
{
blnEndOfLine = true;
}
这些在字符串变量的末尾看不到任何东西。有没有办法告诉我真正的行尾,以便我可以将这些行组合成一行?我应该以不同的方式阅读文件吗?
如果您发布的内容与文件中的内容完全一致。意思是 \r\n 确实写好了,你可以使用下面的命令来取消转义:
strDataLine.Replace("\r", "\r").Replace("\n", "\n");
这将确保您现在可以使用 Environment.NewLine
进行比较,如:
if (strDataLine.Replace("\r", "\r").Replace("\n", "\n").EndsWith(Environment.NewLine))
{
blnEndOfLine = true;
}
您不能使用StringReader 的ReadLine 方法,因为每种换行符。 \r\n
和 \n
都从输入中删除,reader 返回一行,您永远不会知道删除的字符是 \r\n 还是只是 \n
如果文件不是很大,那么您可以尝试将所有内容加载到内存中,然后将自己拆分成单独的行
// Load everything in memory
string fileData = File.ReadAllText(@"D:\temp\myData.txt");
// Split on the \r\n (I don't use Environment.NewLine because it
// respects the OS conventions and this could be wrong in this context
string[] lines = fileData.Split(new string[] { "\r\n"}, StringSplitOptions.RemoveEmptyEntries);
// Now replace the remaining \n with a space
lines = lines.Select(x => x.Replace("\n", " ")).ToArray();
foreach(string s in lines)
Console.WriteLine(s);
编辑
如果你的文件真的很大(比如你说的 3.5GB),那么你无法将所有内容加载到内存中,但你需要分块处理它。幸运的是,StreamReader 提供了一个名为 ReadBlock 的方法,允许我们实现这样的代码
// Where we store the lines loaded from file
List<string> lines = new List<string>();
// Read a block of 10MB
char[] buffer = new char[1024 * 1024 * 10];
bool lastBlock = false;
string leftOver = string.Empty;
// Start the streamreader
using (StreamReader reader = new StreamReader(@"D:\temp\localtext.txt"))
{
// We exit when the last block is reached
while (!lastBlock)
{
// Read 10MB
int loaded = reader.ReadBlock(buffer, 0, buffer.Length);
// Exit if we have no more blocks to read (EOF)
if(loaded == 0) break;
// if we get less bytes than the block size then
// we are on the last block
lastBlock = (loaded != buffer.Length);
// Create the string from the buffer
string temp = new string(buffer, 0, loaded);
// prepare the working string adding the remainder from the
// previous loop
string current = leftOver + temp;
// Search the last \r\n
int lastNewLinePos = temp.LastIndexOf("\r\n");
if (lastNewLinePos > -1)
{
// Prepare the working string
current = leftOver + temp.Substring(0, lastNewLinePos + 2);
// Save the incomplete parts for the next loop
leftOver = temp.Substring(lastNewLinePos + 2);
}
// Process the lines
AddLines(current, lines);
}
}
void AddLines(string current, List<string> lines)
{
var splitted = current.Split(new string[] { "\r\n" }, StringSplitOptions.RemoveEmptyEntries);
lines.AddRange(splitted.Select(x => x.Replace("\n", " ")).ToList());
}
此代码假定您的文件始终以 \r\n 结尾,并且您始终在 10MB 的文本块中获得 \r\n。需要对您的实际数据进行更多测试。
您可以通过调用 File.ReadAllText(path)
读取所有文本并按以下方式解析它:
string input = File.ReadAllText(your_file_path);
string output = string.Empty;
input.Split(new[] { Environment.NewLine } , StringSplitOptions.RemoveEmptyEntries).
Skip(1).ToList().
ForEach(x =>
{
output += x.EndsWith("\r\n") ? x + Environment.NewLine
: x.Replace("\n"," ");
});