Shell:如何判断文本文件中没有一行长度超过2048字节

Question

我意识到文本文件中单行的大小有机地超过 2048 字节的可能性很小。但我仍然认为知道如何确保不是这种情况非常有价值

编辑：只是想说我问这个问题的原因是因为我正在编写一个脚本来验证文件是否为 POSIX 定义的文本文件。其中一项要求是文本文件中的任何行的长度都不得超过 {LINE_MAX} 字节（包括换行符）。在 Ubuntu 和 FreeBSD 上，该值为 2048。

在 GNU Linux 上，您不必担心此限制，因为它允许行长度仅受内存限制。然而，FreeBSD 确实施加了这个限制，我最近努力学习 FreeBSD，所以我认为这对我来说是一件很重要的事情。

编辑：我认为我对 FreeBSD 的看法是错误的。我能够使用 grep

处理长度超过 2048 字节的行

Answer 1

这将逐字查找字节数：

LANG=C grep -E '^.{2049}' some.txt

例如：

$ printf é | LANG=C grep -E '^.{2}'
é

如果您的意思是 个字符， 使用相关的 LANG 值或不将其设置为依赖于您的 shell 默认值：

$ printf é | LANG=en_US.utf8 grep -E '^.{2}'
$ echo $?
1

如果你的意思是字素，使用this:

printf  | grep -Px '\X{2}'
$ echo $?
1

Answer 2

你可以看到有多少行太长了：

cut -b 2049- < inputfile | grep -c '.'
# When you want to count chars, not bytes, use "-c"
cut -c 2049- < inputfile | grep -c '.'

您可以在函数中使用它

checkfile() {
   if [ $# -eq 2 ]; then
      overflow=""
   else
      overflow=2049
   fi
   cut -b "${overflow}" < "" | grep -c '.' > /dev/null
}

# Run test
testfile=/tmp/overflow.txt
echo "1234567890" > "${testfile}" # length 10, not counting '\n'
for boundary in 5 10 20; do
   echo "Check with maxlen ${boundary}"
   checkfile "${testfile}" ${boundary}
   if [ $? -eq 0 ]; then
      echo File OK
   else
      echo Overflow
   fi
   # example in check. Look out: the last ';' is needed.
   checkfile "${testfile}" ${boundary} || { echo "Your comment"; echo "can call exit now"; }
   # checkfile "${testfile}" ${boundary} || { echo "${testfile} has long lines" ; exit 1"; }
done

Shell:如何判断文本文件中没有一行长度超过2048字节

Shell: how to determine that no lines in a text file exceed 2048 bytes in length

shell

posix

text-files