wget 在 shell 中解析网页

Question

我正在尝试使用 wget 从网页中提取 URL。我试过了

 wget -r -l2 --reject=gif -O out.html www.google.com | sed -n 's/.*href="\([^"]*\).*//p'

显示已完成

  Downloaded: 18,472 bytes in 1 files

但不显示网页链接。如果我尝试单独进行

  wget -r -l2 --reject=gif -O out.html www.google.com 
  sed -n 's/.*href="\([^"]*\).*//p' < out.html

输出

  http://www.google.com/intl/en/options/            
  /intl/en/policies/terms/

它没有显示所有链接

ttp://www.google.com
http://maps.google.com
https://play.google.com
http://www.youtube.com
http://news.google.com
https://mail.google.com
https://drive.google.com
http://www.google.com
http://www.google.com
http://www.google.com
https://www.google.com
https://plus.google.com

此外，我想从第 2 级获得链接，还有更多人可以为此提供解决方案

提前致谢

Answer 1

如果不想用grep，可以试试

sed -n "/href/ s/.*href=['\"]\([^'\"]*\)['\"].*//gp"

Answer 2

-O file 选项捕获 wget 的输出并将其写入指定文件，因此没有输出通过管道到达 sed。您可以说 -O - 将 wget 输出定向到标准输出。

wget 在 shell 中解析网页

wget to parse a webpage in shell

bash

shell

wget