C# Streamreader - 仅在 {CR}{LF} 处中断
C# Streamreader - Break on {CR}{LF} only
在执行复杂的 SSIS 插入包之前,我正在尝试计算文本文件中的行数(以与控制文件进行比较)。
目前我使用的是 StreamReader,它用嵌入新行的 {LF} 换行,而 SSIS 使用的是 {CR}{LF}(正确),因此计数没有统计。
有谁知道执行此操作的替代方法,我可以仅根据 {CR}{LF} 换行符计算文件中的行数?
提前致谢
{CR}{LF} 是所需的。不能真的说哪个是正确的。
因为 ReadLine 去掉了你不知道的行尾
使用 StreamReader.Read Method () 并查找 13,然后查找 10
它 return 整数
这是一种非常懒惰的方法...这会将整个文件读入内存。
var cnt = File.ReadAllText("yourfile.txt")
.Split(new[] { "\r\n" }, StringSplitOptions.None)
.Length;
遍历文件并计算 CRLF 的数量。
非常简单的实现:
public int CountLines(Stream stream, Encoding encoding)
{
int cur, prev = -1, lines = 0;
using (var sr = new StreamReader(stream, encoding, false, 4096, true))
{
while ((cur = sr.Read()) != -1)
{
if (prev == '\r' && cur == '\n')
lines++;
prev = cur;
}
}
//Empty stream will result in 0 lines, any content would result in at least one line
if (prev != -1)
lines++;
return lines;
}
用法示例:
using(var s = File.OpenRead(@"<your_file_path>"))
Console.WriteLine("Found {0} lines", CountLines(s, Encoding.Default));
实际上是在字符串任务中查找子字符串。可以使用更通用的算法。
这是一个扩展方法,它只读取带有行分隔符 {Cr}{Lf} 的行,而不是 {LF}。你可以算一下。
var count= new StreamReader(@"D:\Test.txt").ReadLinesCrLf().Count()
但也可以用它来读取文件,有时有用,因为正常的 StreamReader.ReadLine 在 {Cr}{Lf} 和 {LF} 上都会中断。可用于任何 TextReader 和工作流(文件大小不是问题)。
public static IEnumerable<string> ReadLinesCrLf(this TextReader reader, int bufferSize = 4096)
{
StringBuilder lineBuffer = null;
//read buffer
char[] buffer = new char[bufferSize];
int charsRead;
var previousIsLf = false;
while ((charsRead = reader.Read(buffer, 0, bufferSize)) != 0)
{
int bufferIndex = 0;
int writeIdx = 0;
do
{
var currentChar = buffer[bufferIndex];
switch (currentChar)
{
case '\n':
if (previousIsLf)
{
if (lineBuffer == null)
{
//return from current buffer writeIdx could be higher than 0 when multiple rows are in the buffer
yield return new string(buffer, writeIdx, bufferIndex - writeIdx - 1);
//shift write index to next character that will be read
writeIdx = bufferIndex + 1;
}
else
{
Debug.Assert(writeIdx == 0, $"Write index should be 0, when linebuffer != null");
lineBuffer.Append(buffer, writeIdx, bufferIndex - writeIdx);
Debug.Assert(lineBuffer.ToString().Last() == '\r',$"Last character in linebuffer should be a carriage return now");
lineBuffer.Length--;
//shift write index to next character that will be read
writeIdx = bufferIndex + 1;
yield return lineBuffer.ToString();
lineBuffer = null;
}
}
previousIsLf = false;
break;
case '\r':
previousIsLf = true;
break;
default:
previousIsLf = false;
break;
}
bufferIndex++;
} while (bufferIndex < charsRead);
if (writeIdx < bufferIndex)
{
if (lineBuffer == null) lineBuffer = new StringBuilder();
lineBuffer.Append(buffer, writeIdx, bufferIndex - writeIdx);
}
}
//return last row
if (lineBuffer != null && lineBuffer.Length > 0) yield return lineBuffer.ToString();
}
在执行复杂的 SSIS 插入包之前,我正在尝试计算文本文件中的行数(以与控制文件进行比较)。
目前我使用的是 StreamReader,它用嵌入新行的 {LF} 换行,而 SSIS 使用的是 {CR}{LF}(正确),因此计数没有统计。
有谁知道执行此操作的替代方法,我可以仅根据 {CR}{LF} 换行符计算文件中的行数?
提前致谢
{CR}{LF} 是所需的。不能真的说哪个是正确的。
因为 ReadLine 去掉了你不知道的行尾
使用 StreamReader.Read Method () 并查找 13,然后查找 10
它 return 整数
这是一种非常懒惰的方法...这会将整个文件读入内存。
var cnt = File.ReadAllText("yourfile.txt")
.Split(new[] { "\r\n" }, StringSplitOptions.None)
.Length;
遍历文件并计算 CRLF 的数量。
非常简单的实现:
public int CountLines(Stream stream, Encoding encoding)
{
int cur, prev = -1, lines = 0;
using (var sr = new StreamReader(stream, encoding, false, 4096, true))
{
while ((cur = sr.Read()) != -1)
{
if (prev == '\r' && cur == '\n')
lines++;
prev = cur;
}
}
//Empty stream will result in 0 lines, any content would result in at least one line
if (prev != -1)
lines++;
return lines;
}
用法示例:
using(var s = File.OpenRead(@"<your_file_path>"))
Console.WriteLine("Found {0} lines", CountLines(s, Encoding.Default));
实际上是在字符串任务中查找子字符串。可以使用更通用的算法。
这是一个扩展方法,它只读取带有行分隔符 {Cr}{Lf} 的行,而不是 {LF}。你可以算一下。
var count= new StreamReader(@"D:\Test.txt").ReadLinesCrLf().Count()
但也可以用它来读取文件,有时有用,因为正常的 StreamReader.ReadLine 在 {Cr}{Lf} 和 {LF} 上都会中断。可用于任何 TextReader 和工作流(文件大小不是问题)。
public static IEnumerable<string> ReadLinesCrLf(this TextReader reader, int bufferSize = 4096)
{
StringBuilder lineBuffer = null;
//read buffer
char[] buffer = new char[bufferSize];
int charsRead;
var previousIsLf = false;
while ((charsRead = reader.Read(buffer, 0, bufferSize)) != 0)
{
int bufferIndex = 0;
int writeIdx = 0;
do
{
var currentChar = buffer[bufferIndex];
switch (currentChar)
{
case '\n':
if (previousIsLf)
{
if (lineBuffer == null)
{
//return from current buffer writeIdx could be higher than 0 when multiple rows are in the buffer
yield return new string(buffer, writeIdx, bufferIndex - writeIdx - 1);
//shift write index to next character that will be read
writeIdx = bufferIndex + 1;
}
else
{
Debug.Assert(writeIdx == 0, $"Write index should be 0, when linebuffer != null");
lineBuffer.Append(buffer, writeIdx, bufferIndex - writeIdx);
Debug.Assert(lineBuffer.ToString().Last() == '\r',$"Last character in linebuffer should be a carriage return now");
lineBuffer.Length--;
//shift write index to next character that will be read
writeIdx = bufferIndex + 1;
yield return lineBuffer.ToString();
lineBuffer = null;
}
}
previousIsLf = false;
break;
case '\r':
previousIsLf = true;
break;
default:
previousIsLf = false;
break;
}
bufferIndex++;
} while (bufferIndex < charsRead);
if (writeIdx < bufferIndex)
{
if (lineBuffer == null) lineBuffer = new StringBuilder();
lineBuffer.Append(buffer, writeIdx, bufferIndex - writeIdx);
}
}
//return last row
if (lineBuffer != null && lineBuffer.Length > 0) yield return lineBuffer.ToString();
}