从 URLS 列表中获取页面标题
Get page titles from a list of URLS
我有 URL 列表,我需要从中获取保存在另一个列表中的页面标题。 wget 或 curl 似乎是正确的方法,但我不知道具体如何。你能帮我吗?谢谢
你是说类似的意思吗?
wget_title_from_filelist.sh
#!/bin/bash
while read -r URL; do
echo -n "$URL --> "
wget -q -O - "$URL" | \
tr "\n" " " | \
sed 's|.*<title>\([^<]*\).*</head>.*||;s|^\s*||;s|\s*$||'
echo
done
filelist.txt
https://whosebug.com
https://cnn.com
https://reddit.com
https://archive.org
用法
./wget_title_from_filelist.sh < filelist.txt
输出
https://whosebug.com --> Stack Overflow - Where Developers Learn, Share, & Build Careers
https://cnn.com --> CNN International - Breaking News, US News, World News and Video
https://reddit.com --> reddit: the front page of the internet
https://archive.org --> Internet Archive: Digital Library of Free & Borrowable Books, Movies, Music & Wayback Machine
说明
tr "\n" " " # remove \n, create one line of input for sed
sed 's|.*<title>\([^<]*\).*</head>.*||; # find <title> in <head>
s|^\s*||; # remove leading spaces
s|\s*$||' # remove trailing spaces
我有 URL 列表,我需要从中获取保存在另一个列表中的页面标题。 wget 或 curl 似乎是正确的方法,但我不知道具体如何。你能帮我吗?谢谢
你是说类似的意思吗?
wget_title_from_filelist.sh
#!/bin/bash
while read -r URL; do
echo -n "$URL --> "
wget -q -O - "$URL" | \
tr "\n" " " | \
sed 's|.*<title>\([^<]*\).*</head>.*||;s|^\s*||;s|\s*$||'
echo
done
filelist.txt
https://whosebug.com
https://cnn.com
https://reddit.com
https://archive.org
用法
./wget_title_from_filelist.sh < filelist.txt
输出
https://whosebug.com --> Stack Overflow - Where Developers Learn, Share, & Build Careers
https://cnn.com --> CNN International - Breaking News, US News, World News and Video
https://reddit.com --> reddit: the front page of the internet
https://archive.org --> Internet Archive: Digital Library of Free & Borrowable Books, Movies, Music & Wayback Machine
说明
tr "\n" " " # remove \n, create one line of input for sed
sed 's|.*<title>\([^<]*\).*</head>.*||; # find <title> in <head>
s|^\s*||; # remove leading spaces
s|\s*$||' # remove trailing spaces