使用 awk 打印文件 BEGIN 部分的行数

Question

我正在尝试编写 awk 脚本，在完成任何操作之前告诉用户文件中有多少行。我知道如何在 END 部分执行此操作，但无法在 BEGIN 部分执行此操作。我搜索了 SE 和 Google，但在 END 部分或作为 bash 脚本的一部分只找到了六种方法，而不是在任何处理发生之前如何做.我希望得到如下内容：

#!/usr/bin/awk -f

BEGIN{
        print "There are a total of " **TOTAL LINES** " lines in this file.\n"
     }
{

        if([=10=]==4587){print "Found record on line number "NR; exit 0;}
}

但一直无法确定如何做到这一点，如果可能的话。谢谢

Answer 1

您可以读取文件两次：

awk 'NR!=1 && FNR==1 {print NR-1} <some more code here>' file{,}

在你的例子中：

awk 'NR!=1 && FNR==1 {print "There are a total of "NR-1" lines in this file.\n"} [=11=]==4587 {print "Found record on line number "NR; exit 0;}' file{,}

您可以使用 file file 而不是 file{,}（它只会显示两次）
NR!=1 && FNR==1 这仅在第二个文件的第一行为真。

使用 awk 脚本包含：

#!/usr/bin/awk -f
NR!=1 && FNR==1 {
    print "There are a total of "NR-1" lines in this file.\n"
    } 
[=12=]==4587 {
    print "Found record on line number "NR; exit 0
    }

通话：

awk -f myscript file{,}

Answer 2

要稳健地执行此操作并且对于多个文件，您需要类似的东西：

$ cat tst.awk
BEGINFILE {
    numLines = 0
    while ( (getline line < FILENAME) > 0 ) {
        numLines++
    }
    print "----\nThere are a total of", numLines, "lines in", FILENAME
}
[=10=]==4587 { print "Found record on line number", FNR, "of", FILENAME; nextfile }
$
$ cat file1
a
4587
c
$
$ cat file2
$
$ cat file3
d
e
f
4587
$
$ awk -f tst.awk file1 file2 file3
----
There are a total of 3 lines in file1
Found record on line number 2 of file1
----
There are a total of 0 lines in file2
----
There are a total of 4 lines in file3
Found record on line number 4 of file3

以上使用 GNU awk 作为 BEGINFILE。任何其他解决方案都难以实现，因此它将处理空文件（您需要一个数组来跟踪正在解析的文件并在跳过空文件后打印 FNR==1 和 END 部分的信息）。

使用 getline 有一些注意事项，不应轻易使用，请参阅 http://awk.info/?tip/getline，但这是其中一种适当且可靠的用法。您还可以通过测试 ERRNO 并跳过文件来测试 BEGINFILE 中的不可读文件（请参阅 gawk 手册）——这种情况会导致其他脚本中止。

Answer 3

BEGIN {
s="cat your_file.txt|wc -l"; 
s | getline file_size;
close(s);
print file_size 
}

这会将名为 your_file.txt 的文件的大小放入 awk 变量 file_size 中并打印出来。

如果您的文件名是动态的，您可以在命令行上传递文件名并更改脚本以使用该变量。

例如my.awk

BEGIN {
s="cat "VAR"|wc -l"; 
s | getline file_size;
close(s);
print file_size 
}

然后你可以这样调用它： awk -v VAR="your_file.txt" -f my.awk

Answer 4

如果您使用 GNU awk 并且需要一个 稳健的通用解决方案来适应 多个，可能为空 输入文件，使用Ed Morton's 解决方案。

此答案使用可移植（POSIX兼容）代码。在所指出的限制范围内，它是健壮的，但 Ed 的 GNU awk 解决方案既简单又健壮。
向 Ed Morton 的帮助致敬。

使用单个输入文件，使用处理行计数更简单shell命令在BEGIN块中，具有以下优点：

调用时，文件名不必指定两次，这与
不同
- 另请注意，已接受的答案并未按预期工作（截至撰写本文时）；正确的形式是（请参阅对答案的评论以获取解释）：
  - awk 'NR==FNR {next} FNR==1 {print NR-1} [=16=]==4587 {print "Found record on line number "NR; exit 0}' file{,}
该解决方案也适用于空输入文件。

就性能而言，这种方法要么只比 awk 中两次读取文件稍慢，甚至更快一点，具体取决于所使用的 awk 实现：

awk '
  BEGIN {
     # Execute a shell command to count the lines and read
     # result into an awk variable via <cmd> | getline <varname>.
     # If the file cannot be read, abort. (The shell has already printed an error msg.)
    cmd="wc -l < \"" ARGV[1] "\""; if ((cmd | getline count) < 1) exit 1; close(cmd)
    printf "There are a total of %s lines in this file.\n\n", count
  }
  [=10=]==4587 { print "Found record on line number " NR; exit 0 }
' file

假设：

文件名作为第一个操作数（非选项参数）传递到命令行，访问方式为 ARGV[1]。
文件名不包含嵌入的 " 个字符。

以下解决方案处理多个文件并做出类似的假设：

传递的所有操作数都是文件名。也就是说，程序后的所有参数必须是 filenames，而不是变量赋值，例如 var=value.
没有文件名包含嵌入的 " 个字符。
如果任何个输入文件不存在或无法读取，则不进行任何处理。

不难将其概括为处理多个文件，但以下解决方案不打印空文件的行数:

awk '
  BEGIN {
     # Loop over all input files and store their line counts in an array.
    for (i=1; i<ARGC; ++i) {
      cmd="wc -l < \"" ARGV[i] "\""; if ((cmd | getline count) < 1) exit 1; close(cmd)
      counts[ARGV[i]] = count
    }
  }
   # At the beginning of every (non-empty) file, print the line count.
  FNR==1 { printf "There are a total of %s lines in file %s.\n\n", counts[FILENAME], FILENAME }
  # [=11=]==4587 { print "%s: Found record on line number %d\n", FILENAME, NR; exit 0 }
' file1 file2 # ...

如果您还想为空文件打印行数，事情会变得有点棘手:

awk '
  BEGIN {
     # Loop over all input files and store their line counts in an array.
    for (i=1; i<ARGC; ++i) {
      cmd="wc -l < \"" ARGV[i] "\""; if ((cmd | getline count) < 1) exit 1; close(cmd)
      counts[ARGV[i]] = count
    }
    fileCount = ARGC - 1
    fmtStringCount = "There are a total of %s lines in file %s.\n\n"
  }
   # At the beginning of every (non-empty) file, print the line count.
  FNR==1 {
   ++fileIndex
    # If there were intervening empty files, print their counts too.
   while (ARGV[fileIndex] != FILENAME) {
       printf fmtStringCount, 0, ARGV[fileIndex++]
   }
   printf fmtStringCount, counts[FILENAME], FILENAME
  }
   # Process input lines
  [=12=]==4587 { print "%s: Found record on line number %d\n", FILENAME, NR; exit 0 }
   # If there are any remaining empty files a the end, print their counts too.
  END {
    while (fileIndex < fileCount) { printf fmtStringCount, 0, ARGV[++fileIndex] }
  }
' file1 file2 # ...

使用 awk 打印文件 BEGIN 部分的行数

Using `awk` to print number of lines in file in the BEGIN section

awk

text

text-processing

使用 awk 打印文件 BEGIN 部​​分的行数

Using `awk` to print number of lines in file in the BEGIN section

awk

text

text-processing

使用 awk 打印文件 BEGIN 部分的行数