使用grep提取单引号之间的路径

Question

我正在使用 wget 下载文件，在此过程中，我保存了日志消息（见下文）供以后使用。最重要的部分是这一行 Saving to: ‘/path/somefile.gz’.

我想通了，如何使用 grep Saving 提取这个片段。
现在，我的问题是： 我怎样才能只提取单引号之间的路径？ ‘/path/somefile.gz’ => /path/somefile.gz

HTTP request sent, awaiting response... 200 OK
Length: 15391 (15K) [application/octet-stream]
Saving to: ‘/path/somefile.gz’

     0K .......... .....                                      100% 79,7M=0s

2020-07-06  - ‘/path/somefile.gz’ saved [15391/15391]


Total wall clock time: 0,1s
Downloaded: 1 files, 15K in 0s (79,7 MB/s)

编辑

有什么方法可以处理这个表格吗？

wget -m --no-parent -nd https://someurl/somefile.gz -P ~/src/  2>&1 |
grep Saving |
tee ~/src/log.txt

提前致谢！

Answer 1

来自 wget 的示例输出：

$ cat wget.out
HTTP request sent, awaiting response... 200 OK
Length: 15391 (15K) [application/octet-stream]
Saving to: '/path/somefile.gz'

     0K .......... .....                                      100% 79,7M=0s

2020-07-06  - 'path/somefile.gz' saved [15391/15391]


Total wall clock time: 0,1s
Downloaded: 1 files, 15K in 0s (79,7 MB/s)

一个awk解决方案来提取所需的path/file:

$ awk -F"'" '                        # define input delimiter as single quote
/Saving to:/   { print  }          # if line contains string "Saving to:" then print 2nd input field
' wget.out                           # our input
/path/somefile.gz                    # our output

要将以上内容保存到变量中：

$ wget_path=$(awk -F"'" '/Saving to:/ {print }' wget.out)
$ echo "${wget_path}"
/path/somefile.gz

跟进 OP 对问题的编辑...将 wget 的输出输送到 awk 解决方案中：

wget -m --no-parent -nd https://someurl/somefile.gz -P ~/src/ 2>&1 | awk -F"'" '/Saving to:/ {print }' | tee ~/src/log.txt

Answer 2

由于问题在 grep 中要求解决方案，因此提取指定路径的单个 GNU grep 命令可以是：

grep -Po "^Saving to: .\K[^']*"

假设在 grep 中实现了 Perl 正则表达式（并非所有 grep 都实现了这些）。

当然也可以用在管道中：

wget_command | grep -Po "^Saving to: .\K[^']*" | tee log.txt

请注意，我使用单引号 (') 字符来锚定模式匹配表达式中路径的结尾，但在问题中，Unicode Character Left Single Quotation Mark (U+2018) (‘) 和 Unicode 字符右单引号 (U+2019) (’) 用于样本输入。如果这是真的，那么只需将上面模式匹配表达式中的 [^'] 替换为 [^’]。

使用grep提取单引号之间的路径

extract path between single quotes using grep

regex

bash

grep

wget