快速获取大文件总行数的方法

Fast way to get total line number of a large file

我正在处理大型文本文件(大于 100MB)。我需要尽可能快的总行数。我目前正在使用下面的代码(更新:添加了 try-finally):

var
  SR: TStreamReader;
  totallines: int64;
  str: string;
begin
  SR:=TStreamReader.Create(myfilename, TEncoding.UTF8);
  try
    totallines:=0;
    while not SR.EndOfStream do
    begin
      str:=SR.ReadLine;
      inc(totallines);
    end;
  finally
    SR.Free;
  end;
end;

有没有更快的方法来获取总行数?

答案很简单:不。您的算法是最快的,但实现不是。您必须阅读整个文件并计算行数。至少如果行不是固定大小。

您读取文件的方式可能会影响全局性能。 在尽可能大的二进制缓冲区(字节数组)中逐块读取文件。然后计算缓冲区中的行数并使用同一缓冲区中的块循环。

Program LineCount;

{$APPTYPE CONSOLE}
{$WEAKLINKRTTI ON}
{$RTTI EXPLICIT METHODS([]) PROPERTIES([]) FIELDS([])}
{$SetPEFlags 1}

{ Compile with XE8 or above... }

USES
  SysUtils,
  BufferedFileStream;

VAR
  LineCnt: Int64;
  Ch: Char;
  BFS: TReadOnlyCachedFileStream;

function Komma(const S: string; const C: Char = ','): string;
{ About 4 times faster than Comma... }
var
  I: Integer; // loops through separator position
begin
  Result := S;
  I := Length(S) - 2;
  while I > 1 do
  begin
    Insert(C, Result, I);
    I := I - 3;
  end;
end; {Komma}

BEGIN
  writeln('LineCount - Copyright (C) 2020 by Walter L. Chester.');
  writeln('Counts lines in the given textfile.');
  if ParamCount <> 1 then
    begin
      writeln('USAGE:  LineCount <filename>');
      writeln;
      writeln('No file size limit!  Counts lines: takes 4 minutes on a 16GB file.');
      Halt;
    end;
  if not FileExists(ParamStr(1)) then
    begin
      writeln('File not found!');
      halt;
    end;
  writeln('Counting lines in file...');
  BFS := TReadOnlyCachedFileStream.Create(ParamStr(1), fmOpenRead);
  try
    LineCnt := 0;
    while BFS.Read(ch,1) = 1 do
      begin
        if ch = #13 then
          Inc(LineCnt);
        if (LineCnt mod 1000000) = 0 then
          write('.');
      end;
    writeln;
    writeln('Total Lines: ' + Komma(LineCnt.ToString));
  finally
    BFS.Free;
  end;
END.