如何跳过awk中的目录?
How to skip a directory in awk?
假设我有以下文件和目录结构:
$ tree
.
├── a
├── b
└── dir
└── c
1 directory, 3 files
也就是说,两个文件 a
和 b
以及一个目录 dir
,其中另一个文件 c
所在。
我想用 awk
处理所有文件(GNU Awk 4.1.1
,正好),所以我这样做:
$ gawk '{print FILENAME; nextfile}' * */*
a
b
awk: cmd. line:1: warning: command line argument `dir' is a directory: skipped
dir/c
一切正常,但 *
也扩展到目录 dir
并且 awk
尝试处理它。
所以我想知道:是否有任何本机方法 awk
可以检查给定元素是否为文件,如果是,则跳过它?也就是说,不使用 system()
。
我通过在 BEGINFILE 中调用外部 system
使其工作:
$ gawk 'BEGINFILE{print FILENAME; if (system(" [ ! -d " FILENAME " ]")) {print FILENAME, "is a dir, skipping"; nextfile}} ENDFILE{print FILENAME, FNR}' * */*
a
a 10
a.wk
a.wk 3
b
b 10
dir
dir is a dir, skipping
dir/c
dir/c 10
另请注意 if (system(" [ ! -d " FILENAME " ]")) {print FILENAME, "is a dir, skipping"; nextfile}
与直觉相反的事实:当为真时它应该 return 1,但它 return 是退出代码。
我读到A.5 Extensions in gawk Not in POSIX awk:
- Directories on the command line produce a warning and are skipped (see Command-line directories)
然后链接页面显示:
4.11 Directories on the Command Line
According to the POSIX standard, files named on the awk command line
must be text files; it is a fatal error if they are not. Most versions
of awk treat a directory on the command line as a fatal error.
By default, gawk produces a warning for a directory on the command
line, but otherwise ignores it. This makes it easier to use shell
wildcards with your awk program:
$ gawk -f whizprog.awk * Directories could kill this program
If either of the --posix or --traditional options is given, then gawk
reverts to treating a directory on the command line as a fatal error.
See Extension Sample Readdir, for a way to treat directories as usable
data from an awk program.
事实上就是这样:与之前使用 --posix
相同的命令失败:
$ gawk --posix 'BEGINFILE{print FILENAME; if (system(" [ ! -d " FILENAME " ]")) {print FILENAME, "is a dir, skipping"; nextfile}} ENDFILE{print FILENAME, NR}' * */*
gawk: cmd. line:1: fatal: cannot open file `dir' for reading (Is a directory)
我检查了上面链接的 16.7.6 Reading Directories
部分,他们谈论 readdir
:
The readdir extension adds an input parser for directories. The usage
is as follows:
@load "readdir"
但我不知道如何调用它,也不知道如何从命令行使用它。
我只是避免将目录传递给 awk,因为甚至 POSIX 说所有文件名参数都必须是文本文件。
您可以使用find
遍历目录:
find PATH -type f -exec awk 'program' {} +
如果您想保护您的脚本不被其他人错误地传递给它一个目录(或任何其他非可读文本文件),您可以这样做:
$ ls -F tmp
bar dir/ foo
$ cat tmp/foo
line 1
$ cat tmp/bar
line 1
line 2
$ cat tmp/dir
cat: tmp/dir: Is a directory
$ cat tst.awk
BEGIN {
for (i=1;i<ARGC;i++) {
if ( (getline line < ARGV[i]) <= 0 ) {
print "Skipping:", ARGV[i], ERRNO
delete ARGV[i]
}
close(ARGV[i])
}
}
{ print FILENAME, [=10=] }
$ awk -f tst.awk tmp/*
Skipping: tmp/dir Is a directory
tmp/bar line 1
tmp/bar line 2
tmp/foo line 1
$ awk --posix -f tst.awk tmp/*
Skipping: tmp/dir
tmp/bar line 1
tmp/bar line 2
tmp/foo line 1
Per POSIX getline
returns -1
if/when 它尝试从文件中检索记录失败(例如,不可读文件或文件不存在或文件是一个目录),你只需要 GNU awk 告诉你哪些失败是通过 ERRNO
的值,如果你关心的话。
假设我有以下文件和目录结构:
$ tree
.
├── a
├── b
└── dir
└── c
1 directory, 3 files
也就是说,两个文件 a
和 b
以及一个目录 dir
,其中另一个文件 c
所在。
我想用 awk
处理所有文件(GNU Awk 4.1.1
,正好),所以我这样做:
$ gawk '{print FILENAME; nextfile}' * */*
a
b
awk: cmd. line:1: warning: command line argument `dir' is a directory: skipped
dir/c
一切正常,但 *
也扩展到目录 dir
并且 awk
尝试处理它。
所以我想知道:是否有任何本机方法 awk
可以检查给定元素是否为文件,如果是,则跳过它?也就是说,不使用 system()
。
我通过在 BEGINFILE 中调用外部 system
使其工作:
$ gawk 'BEGINFILE{print FILENAME; if (system(" [ ! -d " FILENAME " ]")) {print FILENAME, "is a dir, skipping"; nextfile}} ENDFILE{print FILENAME, FNR}' * */*
a
a 10
a.wk
a.wk 3
b
b 10
dir
dir is a dir, skipping
dir/c
dir/c 10
另请注意 if (system(" [ ! -d " FILENAME " ]")) {print FILENAME, "is a dir, skipping"; nextfile}
与直觉相反的事实:当为真时它应该 return 1,但它 return 是退出代码。
我读到A.5 Extensions in gawk Not in POSIX awk:
- Directories on the command line produce a warning and are skipped (see Command-line directories)
然后链接页面显示:
4.11 Directories on the Command Line
According to the POSIX standard, files named on the awk command line must be text files; it is a fatal error if they are not. Most versions of awk treat a directory on the command line as a fatal error.
By default, gawk produces a warning for a directory on the command line, but otherwise ignores it. This makes it easier to use shell wildcards with your awk program:
$ gawk -f whizprog.awk * Directories could kill this program
If either of the --posix or --traditional options is given, then gawk reverts to treating a directory on the command line as a fatal error.
See Extension Sample Readdir, for a way to treat directories as usable data from an awk program.
事实上就是这样:与之前使用 --posix
相同的命令失败:
$ gawk --posix 'BEGINFILE{print FILENAME; if (system(" [ ! -d " FILENAME " ]")) {print FILENAME, "is a dir, skipping"; nextfile}} ENDFILE{print FILENAME, NR}' * */*
gawk: cmd. line:1: fatal: cannot open file `dir' for reading (Is a directory)
我检查了上面链接的 16.7.6 Reading Directories
部分,他们谈论 readdir
:
The readdir extension adds an input parser for directories. The usage is as follows:
@load "readdir"
但我不知道如何调用它,也不知道如何从命令行使用它。
我只是避免将目录传递给 awk,因为甚至 POSIX 说所有文件名参数都必须是文本文件。
您可以使用find
遍历目录:
find PATH -type f -exec awk 'program' {} +
如果您想保护您的脚本不被其他人错误地传递给它一个目录(或任何其他非可读文本文件),您可以这样做:
$ ls -F tmp
bar dir/ foo
$ cat tmp/foo
line 1
$ cat tmp/bar
line 1
line 2
$ cat tmp/dir
cat: tmp/dir: Is a directory
$ cat tst.awk
BEGIN {
for (i=1;i<ARGC;i++) {
if ( (getline line < ARGV[i]) <= 0 ) {
print "Skipping:", ARGV[i], ERRNO
delete ARGV[i]
}
close(ARGV[i])
}
}
{ print FILENAME, [=10=] }
$ awk -f tst.awk tmp/*
Skipping: tmp/dir Is a directory
tmp/bar line 1
tmp/bar line 2
tmp/foo line 1
$ awk --posix -f tst.awk tmp/*
Skipping: tmp/dir
tmp/bar line 1
tmp/bar line 2
tmp/foo line 1
Per POSIX getline
returns -1
if/when 它尝试从文件中检索记录失败(例如,不可读文件或文件不存在或文件是一个目录),你只需要 GNU awk 告诉你哪些失败是通过 ERRNO
的值,如果你关心的话。