Awk Regex：匹配任何第一个字符

Question

我正在尝试创建一个更有效的 "check if URL exist" 函数，我快完成了，唯一的障碍是正则表达式。

所以我正在寻找一个正则表达式来匹配输出的任何第一个字符然后打印它并退出例如，下面的代码获取 youtube 页面的源代码，一旦输出到达标题标签，它就会匹配它们并终止 wget 命令

从这里借来的想法

https://unix.stackexchange.com/questions/103252/how-do-i-get-a-websites-title-using-command-line

Performance/Efficiency

Here, out of laziness, we have perl read the whole content in memory before starting to look for the tag. Given that the title is found in the section that is in the first few bytes of the file, that's not optimal. A better approach, if GNU awk is available on your system could be:
wget -qO- 'http://www.youtube.com/watch?v=Dd7dQh8u4Hc' | \
gawk -v IGNORECASE=1 -v RS='</title' 'RT{gsub(/.*<title[^>]*>/,"");print;exit}' 
That way, awk stops reading after the first

我的逻辑是这样的：如果 URL 存在，它将输出源代码，我不想浪费时间下载整个源代码，因此在源代码输出的第一个字符上打印它并退出。

然后我将存储 wget 和 gawk 的输出

first_character_of_source_code=$(wget|awk magic)
if [[ $first_character_of_source_code != '' ]]; then
    echo "URL exists!"
else
    echo "URL doesn't exist!"
fi

另外，对于我的 "check if URL exist" 函数，我试过这个 How do I determine if a web page exists with shell scripting? 答案中建议的 curl 解决方案基本上没问题，但像 Quora return 403 Forbidden 这样的网站，是的，我已经添加了用户代理，但是 wget plus gawk 解决方案 return 源代码更适合确定是否URL 存在。

Answer 1

感谢@karakfa 的建议，我找到了解决方案

匹配输出的第一个字符，打印并退出

echo "Yes, a down vote, just what I needed" | awk '{print ;exit}' FS=""
# It will print
Y

我的脚本的完整源代码 check_URL.sh（完美运行）

# Variables
URL="$*"
user_agent="Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36"

# Main program
first_character_of_source_code=$(wget -e robots=off --user-agent="$user_agent" -qO- "$URL" | \
awk '{print ;exit}' FS="")

if [[ $first_character_of_source_code != '' ]]; then
    echo "URL exists!"
    exit 0
else
    echo "URL doesn't exist!"
    exit 1
fi

Answer 2

如果您不是那么热衷于使用 awk，您可以使用 grep:

快速轻松地完成它

if wget -qO - https://whosebug.com/ | grep -q ""
then
  echo "wget returned at least one character."
fi

Awk Regex：匹配任何第一个字符

Awk Regex: match any first character

regex

bash

awk

gawk