如何剪切由变量定义的一系列行

Question

我有这个 python 爬虫输出

[+] Site to crawl: http://www.example.com
[+] Start time: 2020-05-24 07:21:27.169033
[+] Output file: www.example.com.crawler

[+] Crawling
   [-] http://www.example.com
   [-] http://www.example.com/
   [-] http://www.example.com/icons/ubuntu-logo.png
   [-] http://www.example.com/manual
    [i] 404 Not Found
[+] Total urls crawled: 4

[+] Directories found:
   [-] http://www.example.com/icons/
[+] Total directories: 1

[+] Directory with indexing

我想使用 awk 或任何其他工具切断“抓取”和“抓取的总网址”之间的界线，所以基本上我想使用变量将 NR 分配给第一个关键字“抓取”和第二个变量分配给它第二个限制器的 NR 值“Total urls crawled”，然后缩小两者之间的范围，我试过这样的事情：

awk 'NR>$(Crawling) && NR<$(urls)' file.txt

但没有任何效果，我得到的最好的结果是从 Crawling+1 行到文件末尾的剪切，这真的没有帮助，那么如何做以及如何剪切一系列行带变量的 awk！

awk

Answer 1

如果我的要求正确，你想将 shell 变量放入 awk 代码并搜索字符串，然后尝试执行以下操作。

awk -v crawl="Crawling" -v url="Total urls crawled" '
[=10=] ~ url{
  found=""
  next
}
[=10=] ~ crawl{
  found=1
  next
}
found
'  Input_file

说明：为上文添加详细说明。

awk -v crawl="Crawling" -v url="Total urls crawled" '   ##Starting awk program and setting crawl and url values of variables here.
[=11=] ~ url{                      ##Checking if line is matched to url variable then do following.
  found=""                     ##Nullify the variable found here.
  next                         ##next will skip further statements from here.
}
[=11=] ~ crawl{                    ##Checking if line is matched to crawl variable then do following.
  found=1                      ##Setting found value to 1 here.
  next                         ##next will skip further statements from here.
}
found                          ##Checking condition if found is SET(NOT NULL) then print current line.
'  Input_file                  ##Mentioning Input_file name here.

Answer 2

子句“...或任何其他工具”提示我指出可以在命令行模式下使用脚本语言来实现这一点。下面是如何使用 Ruby 完成的，其中 't' 是包含要从中提取指定行的文本的文件的名称。将在 shell.

中输入以下内容

ruby -W0 -e 'puts STDIN.readlines.select { |line| true if line.match?(/\bCrawling\b/)..line.match?(/\bTotal urls crawled\b/) }[1..-2]' < t

显示以下内容：

["   [-] http://www.example.com",
 "   [-] http://www.example.com/",
 "   [-] http://www.example.com/icons/ubuntu-logo.png",
 "   [-] http://www.example.com/manual",
 "    [i] 404 Not Found"]

执行以下操作。

STDIN.readlines 和 < t 将 t 的行读入数组
select 选择其块计算 returns true
[1..-2] 提取除第一行和最后一行以外的所有选定行

select的区块计算，

true if line.match?(/\bCrawling\b/)..line.match?(/\bTotal urls crawled\b/)

雇用 flip-flop operator。块 returns nil（被 Ruby 视为 false）直到匹配 /\bCrawling\b 的行被读取，即 "[+] Crawling"。该块然后 returns true，并继续 return true，直到遇到匹配 /\bTotal urls crawled\b 的行，即 "[+] Total urls crawled: 4"。该行的块 returns true，但每个后续行的 returns false 直到并且如果它遇到另一行匹配 /\bCrawling\b，在这种情况下，该过程会重复。因此，"flip-flop".

命令行中的

"-W0" 会抑制警告消息。没有它，人们可能会看到警告 "flip-flop is deprecated"（取决于所使用的 Ruby 的版本）。在决定弃用（很少使用的）触发器运算符后，Ruby 支持者手持干草叉和手电筒走上街头抗议。 Ruby 僧侣们看到了他们的错误，并改变了他们的决定。

如何剪切由变量定义的一系列行

How to cut a range of lines defined by variables

shell

awk

cut

range

awk