快速获取大文件总行数的方法
Fast way to get total line number of a large file
我正在处理大型文本文件(大于 100MB)。我需要尽可能快的总行数。我目前正在使用下面的代码(更新:添加了 try-finally):
var
SR: TStreamReader;
totallines: int64;
str: string;
begin
SR:=TStreamReader.Create(myfilename, TEncoding.UTF8);
try
totallines:=0;
while not SR.EndOfStream do
begin
str:=SR.ReadLine;
inc(totallines);
end;
finally
SR.Free;
end;
end;
有没有更快的方法来获取总行数?
答案很简单:不。您的算法是最快的,但实现不是。您必须阅读整个文件并计算行数。至少如果行不是固定大小。
您读取文件的方式可能会影响全局性能。
在尽可能大的二进制缓冲区(字节数组)中逐块读取文件。然后计算缓冲区中的行数并使用同一缓冲区中的块循环。
Program LineCount;
{$APPTYPE CONSOLE}
{$WEAKLINKRTTI ON}
{$RTTI EXPLICIT METHODS([]) PROPERTIES([]) FIELDS([])}
{$SetPEFlags 1}
{ Compile with XE8 or above... }
USES
SysUtils,
BufferedFileStream;
VAR
LineCnt: Int64;
Ch: Char;
BFS: TReadOnlyCachedFileStream;
function Komma(const S: string; const C: Char = ','): string;
{ About 4 times faster than Comma... }
var
I: Integer; // loops through separator position
begin
Result := S;
I := Length(S) - 2;
while I > 1 do
begin
Insert(C, Result, I);
I := I - 3;
end;
end; {Komma}
BEGIN
writeln('LineCount - Copyright (C) 2020 by Walter L. Chester.');
writeln('Counts lines in the given textfile.');
if ParamCount <> 1 then
begin
writeln('USAGE: LineCount <filename>');
writeln;
writeln('No file size limit! Counts lines: takes 4 minutes on a 16GB file.');
Halt;
end;
if not FileExists(ParamStr(1)) then
begin
writeln('File not found!');
halt;
end;
writeln('Counting lines in file...');
BFS := TReadOnlyCachedFileStream.Create(ParamStr(1), fmOpenRead);
try
LineCnt := 0;
while BFS.Read(ch,1) = 1 do
begin
if ch = #13 then
Inc(LineCnt);
if (LineCnt mod 1000000) = 0 then
write('.');
end;
writeln;
writeln('Total Lines: ' + Komma(LineCnt.ToString));
finally
BFS.Free;
end;
END.
我正在处理大型文本文件(大于 100MB)。我需要尽可能快的总行数。我目前正在使用下面的代码(更新:添加了 try-finally):
var
SR: TStreamReader;
totallines: int64;
str: string;
begin
SR:=TStreamReader.Create(myfilename, TEncoding.UTF8);
try
totallines:=0;
while not SR.EndOfStream do
begin
str:=SR.ReadLine;
inc(totallines);
end;
finally
SR.Free;
end;
end;
有没有更快的方法来获取总行数?
答案很简单:不。您的算法是最快的,但实现不是。您必须阅读整个文件并计算行数。至少如果行不是固定大小。
您读取文件的方式可能会影响全局性能。 在尽可能大的二进制缓冲区(字节数组)中逐块读取文件。然后计算缓冲区中的行数并使用同一缓冲区中的块循环。
Program LineCount;
{$APPTYPE CONSOLE}
{$WEAKLINKRTTI ON}
{$RTTI EXPLICIT METHODS([]) PROPERTIES([]) FIELDS([])}
{$SetPEFlags 1}
{ Compile with XE8 or above... }
USES
SysUtils,
BufferedFileStream;
VAR
LineCnt: Int64;
Ch: Char;
BFS: TReadOnlyCachedFileStream;
function Komma(const S: string; const C: Char = ','): string;
{ About 4 times faster than Comma... }
var
I: Integer; // loops through separator position
begin
Result := S;
I := Length(S) - 2;
while I > 1 do
begin
Insert(C, Result, I);
I := I - 3;
end;
end; {Komma}
BEGIN
writeln('LineCount - Copyright (C) 2020 by Walter L. Chester.');
writeln('Counts lines in the given textfile.');
if ParamCount <> 1 then
begin
writeln('USAGE: LineCount <filename>');
writeln;
writeln('No file size limit! Counts lines: takes 4 minutes on a 16GB file.');
Halt;
end;
if not FileExists(ParamStr(1)) then
begin
writeln('File not found!');
halt;
end;
writeln('Counting lines in file...');
BFS := TReadOnlyCachedFileStream.Create(ParamStr(1), fmOpenRead);
try
LineCnt := 0;
while BFS.Read(ch,1) = 1 do
begin
if ch = #13 then
Inc(LineCnt);
if (LineCnt mod 1000000) = 0 then
write('.');
end;
writeln;
writeln('Total Lines: ' + Komma(LineCnt.ToString));
finally
BFS.Free;
end;
END.