如何打印文件中包含指定字节偏移量的整行？

Question

我有这样一个例子input.txt文件：

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do
eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut
enim ad minim veniam, quis nostrud exercitation ullamco laboris
nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor
in reprehenderit in voluptate velit esse cillum dolore eu fugiat
nulla pariatur. Excepteur sint occaecat cupidatat non proident,
sunt in culpa qui officia deserunt mollit anim id est laborum.

现在我可以轻松地 grep 一个词并得到它的字节偏移量：

$ grep -ob incididunt /dev/null input.txt 
input.txt:80:incididunt

遗憾的是，有关行内容的信息和有关搜索词的信息丢失了。我只知道文件名和 80 字节偏移量。我想打印文件中包含该字节偏移量的整行。

理想情况下，得到一个 script.sh 有两个参数，一个文件名和一个字节偏移量，输出搜索行：

$ ./script.sh input.txt 80
eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut

另一个例子：

对于文件=input.txt 和字节偏移量=130，输出应该是：

enim ad minim veniam, quis nostrud exercitation ullamco laboris

对于文件=input.txt 和 195 到 253 之间的任何字节偏移，输出应为：

nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor

对于文件=input.txt 和字节偏移量=400，输出应该是：

sunt in culpa qui officia deserunt mollit anim id est laborum.

我试过：

我可以使用 gnu sed 从字节偏移量开始打印直到行尾，但是这样会漏掉 eiusmod tempor 部分。我想不出如何在文件中进入 "back"，从换行符获取部分直到该字节偏移量。

$ sed -z 's/.\{80\}\([^\n]*\).*/\n/' input.txt 
incididunt ut labore et dolore magna aliqua. Ut

我可以逐字符阅读，记住最后一个换行符，并从最后一个换行符打印到下一个换行符。这不适用于 shell read，因为它省略了换行符。我想我可以使用 dd 让它工作，但肯定有一个更简单的解决方案。

set -- inpux.txt 80
exec 10<""
pos=0
lastnewlinepos=0
for ((i=0;i<"";++i)); do
        IFS= read -r -u 10 -N 1 c
        pos=$((pos+1))
        # this will not work..., read omits newlines
        if [ "$c" = $'\n' ]; then
                lastnewlinepost="$pos"
        fi
done
# as I know the last newline before the offset, it's ok to use this now
sed -z 's/.\{'"$lastnewlinepos"'\}\([^\n]*\).*/\n/' ""

如何使用 bash 和 *nix 特定工具打印 "contains" 文件内字节偏移量的整行？

Answer 1

请尝试以下操作，您可以根据需要调整input/output，但这会输出单词和包含该单词的行的实际偏移量：

#!/bin/bash
SEARCH_TERM=""
SEARCH_FILE=""
OFFSET_OF_WORD="`grep -ob $SEARCH_TERM $SEARCH_FILE | cut -d':' -f1`"

lastNewLinePos=0
lineNumber=0
for newLinePos in $(grep -b '$' $SEARCH_FILE | cut -d':' -f1)
do
    if (( $OFFSET_OF_WORD >= lastNewLinePos && $OFFSET_OF_WORD < $newLinePos )); then
        echo "Offset: $OFFSET_OF_WORD"
        echo "Line: `sed -n ${lineNumber}p $SEARCH_FILE`"
        break
    fi
    lastNewLinePos=$newLinePos
    let lineNumber++
done

编辑：使用您给定的输入进行测试并执行为

./getLineByOffset.sh incididunt input.txt

编辑 2：如果您只知道偏移量，而不是实际的搜索词

#!/bin/bash
OFFSET_OF_WORD=""
SEARCH_FILE=""

lastNewLinePos=0
lineNumber=0
for newLinePos in $(grep -b '$' $SEARCH_FILE | cut -d':' -f1)
do
    if (( $OFFSET_OF_WORD >= lastNewLinePos && $OFFSET_OF_WORD < $newLinePos )); then
        echo "Offset: $OFFSET_OF_WORD"
        echo "Line: `sed -n ${lineNumber}p $SEARCH_FILE`"
        break
    fi
    lastNewLinePos=$newLinePos
    let lineNumber++
done

Answer 2

使用 GNU awk，将到目前为止读取的字节数保存在一个变量中，当它到达您的 字节偏移量时 打印当前行并退出。例如：

$ awk -b '{ nb += length + 1 } nb >= 80 { print; exit }' file
eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut

关键字 length 是 length([=12=]) 的 shorthand，其中 returns 当前行的字节长度（感谢 -b）。我们需要给它加 1，因为 awk 会去掉行终止符。

如何打印文件中包含指定字节偏移量的整行？

How to print the whole line that contains a specified byte offset in a file?

bash

shell

gnu-coreutils