sed 反向引用的奇怪行为
Weird behavior of sed's backreference
我们有以下一行文字:
| ![](/img/2016/12/020.jakis-tam-text1.png#medium) | ![](/img/2016/12/021.jakis-tam-text2.png#medium) | ![](/img/2016/12/022.jakis-tam-text3.png#medium) |
如您所见,该行文本仅由三个相似的短语组成,可以使用以下 sed 表达式(分别)匹配和更改它们:
sed -n 's@| !\[.*\](\(\/img\/\)\([0-9]*\/[0-9]*\/[0-9]*\)\.\(.*\)\.\(.*\)) |@| ![](..) |@p'
如果我们只有一个短语(而不是给定的三个),结果将如下所示:
$ echo '| ![](/img/2016/12/020.jakis-tam-text1.png#medium) |' | sed -n 's@| !\[.*\](\(\/img\/\)\([0-9]*\/[0-9]*\/[0-9]*\)\.\(.*\)\.\(.*\)) |@| ![](..) |@p'
| ![jakis-tam-text1](/img/2016/12/020.jakis-tam-text1.png#medium) |
但是当我们有两个或三个短语时,结果总是指向最后一个匹配的短语:
这是一个包含两个匹配项的示例:
$ echo '| ![](/img/2016/12/020.jakis-tam-text1.png#medium) | ![](/img/2016/12/021.jakis-tam-text2.png#medium) |' | sed -n 's@| !\[.*\](\(\/img\/\)\([0-9]*\/[0-9]*\/[0-9]*\)\.\(.*\)\.\(.*\)) |@| ![](..) |@p'
| ![jakis-tam-text2](/img/2016/12/021.jakis-tam-text2.png#medium) |
这是一个包含三个匹配项的示例:
$ echo '| ![](/img/2016/12/020.jakis-tam-text1.png#medium) | ![](/img/2016/12/021.jakis-tam-text2.png#medium) | ![](/img/2016/12/022.jakis-tam-text3.png#medium) |' | sed -n 's@| !\[.*\](\(\/img\/\)\([0-9]*\/[0-9]*\/[0-9]*\)\.\(.*\)\.\(.*\)) |@| ![](..) |@p'
| ![jakis-tam-text3](/img/2016/12/022.jakis-tam-text3.png#medium) |
为什么会这样?
有没有办法强制 sed 只打印第一场比赛的结果?
预期的行为?我虽然以下命令会打印类似于此的内容(只是第一个匹配项):
$ echo '| ![](/img/2016/12/020.jakis-tam-text1.png#medium) | ![](/img/2016/12/021.jakis-tam-text2.png#medium) | ![](/img/2016/12/022.jakis-tam-text3.png#medium) |' | sed -n 's@| !\[.*\](\(\/img\/\)\([0-9]*\/[0-9]*\/[0-9]*\)\.\(.*\)\.\(.*\)) |@| ![](..) |@p'
| ![jakis-tam-text1](/img/2016/12/020.jakis-tam-text1.png#medium) |
或这个(所有匹配项):
$ echo '| ![](/img/2016/12/020.jakis-tam-text1.png#medium) | ![](/img/2016/12/021.jakis-tam-text2.png#medium) | ![](/img/2016/12/022.jakis-tam-text3.png#medium) |' | sed -n 's@| !\[.*\](\(\/img\/\)\([0-9]*\/[0-9]*\/[0-9]*\)\.\(.*\)\.\(.*\)) |@| ![](..) |@p'
| ![jakis-tam-text1](/img/2016/12/020.jakis-tam-text1.png#medium) | ![jakis-tam-text2](/img/2016/12/021.jakis-tam-text2.png#medium) | ![jakis-tam-text3](/img/2016/12/022.jakis-tam-text3.png#medium) |
发生的事情是 | !\[.*\]
匹配最长的可能匹配项。也就是说,第一个短语,直到最后一个短语的开头。如果只想匹配第一个短语,则必须更加具体。例如:
sed 's@| !\[\]\(([^.]*\.\([^.]*\)\.[^)]*)\) |.*@| ![] |@'
我不完全理解这个问题,但是,你可以试试这个sed
$ sed 's#\([^[]*.\)\([^\.]*.\([^\.]*\)[^)]*.\)##' input_file
这将打印所有 3 个匹配项,但只会替换到第一个匹配项中
$ sed 's#\([^[]*.\)\([^\.]*.\([^\.]*\)[^)]*.\)##' input_file
| ![jakis-tam-text1](/img/2016/12/020.jakis-tam-text1.png#medium) | ![](/img/2016/12/021.jakis-tam-text2.png#medium) | ![](/img/2016/12/022.jakis-tam-text3.png#medium) |
要定位所有 3 个,可以添加 g
标志
sed 's#\([^[]*.\)\([^\.]*.\([^\.]*\)[^)]*.\)##g' input_file
| ![jakis-tam-text1](/img/2016/12/020.jakis-tam-text1.png#medium) | ![jakis-tam-text2](/img/2016/12/021.jakis-tam-text2.png#medium) | ![jakis-tam-text3](/img/2016/12/022.jakis-tam-text3.png#medium) |
例如,您也可以只定位 #2
$ sed 's#\([^[]*.\)\([^\.]*.\([^\.]*\)[^)]*.\)##2' input_file
| ![](/img/2016/12/020.jakis-tam-text1.png#medium) | ![jakis-tam-text2](/img/2016/12/021.jakis-tam-text2.png#medium) | ![](/img/2016/12/022.jakis-tam-text3.png#medium) |
我们有以下一行文字:
| ![](/img/2016/12/020.jakis-tam-text1.png#medium) | ![](/img/2016/12/021.jakis-tam-text2.png#medium) | ![](/img/2016/12/022.jakis-tam-text3.png#medium) |
如您所见,该行文本仅由三个相似的短语组成,可以使用以下 sed 表达式(分别)匹配和更改它们:
sed -n 's@| !\[.*\](\(\/img\/\)\([0-9]*\/[0-9]*\/[0-9]*\)\.\(.*\)\.\(.*\)) |@| ![](..) |@p'
如果我们只有一个短语(而不是给定的三个),结果将如下所示:
$ echo '| ![](/img/2016/12/020.jakis-tam-text1.png#medium) |' | sed -n 's@| !\[.*\](\(\/img\/\)\([0-9]*\/[0-9]*\/[0-9]*\)\.\(.*\)\.\(.*\)) |@| ![](..) |@p'
| ![jakis-tam-text1](/img/2016/12/020.jakis-tam-text1.png#medium) |
但是当我们有两个或三个短语时,结果总是指向最后一个匹配的短语:
这是一个包含两个匹配项的示例:
$ echo '| ![](/img/2016/12/020.jakis-tam-text1.png#medium) | ![](/img/2016/12/021.jakis-tam-text2.png#medium) |' | sed -n 's@| !\[.*\](\(\/img\/\)\([0-9]*\/[0-9]*\/[0-9]*\)\.\(.*\)\.\(.*\)) |@| ![](..) |@p'
| ![jakis-tam-text2](/img/2016/12/021.jakis-tam-text2.png#medium) |
这是一个包含三个匹配项的示例:
$ echo '| ![](/img/2016/12/020.jakis-tam-text1.png#medium) | ![](/img/2016/12/021.jakis-tam-text2.png#medium) | ![](/img/2016/12/022.jakis-tam-text3.png#medium) |' | sed -n 's@| !\[.*\](\(\/img\/\)\([0-9]*\/[0-9]*\/[0-9]*\)\.\(.*\)\.\(.*\)) |@| ![](..) |@p'
| ![jakis-tam-text3](/img/2016/12/022.jakis-tam-text3.png#medium) |
为什么会这样?
有没有办法强制 sed 只打印第一场比赛的结果?
预期的行为?我虽然以下命令会打印类似于此的内容(只是第一个匹配项):
$ echo '| ![](/img/2016/12/020.jakis-tam-text1.png#medium) | ![](/img/2016/12/021.jakis-tam-text2.png#medium) | ![](/img/2016/12/022.jakis-tam-text3.png#medium) |' | sed -n 's@| !\[.*\](\(\/img\/\)\([0-9]*\/[0-9]*\/[0-9]*\)\.\(.*\)\.\(.*\)) |@| ![](..) |@p'
| ![jakis-tam-text1](/img/2016/12/020.jakis-tam-text1.png#medium) |
或这个(所有匹配项):
$ echo '| ![](/img/2016/12/020.jakis-tam-text1.png#medium) | ![](/img/2016/12/021.jakis-tam-text2.png#medium) | ![](/img/2016/12/022.jakis-tam-text3.png#medium) |' | sed -n 's@| !\[.*\](\(\/img\/\)\([0-9]*\/[0-9]*\/[0-9]*\)\.\(.*\)\.\(.*\)) |@| ![](..) |@p'
| ![jakis-tam-text1](/img/2016/12/020.jakis-tam-text1.png#medium) | ![jakis-tam-text2](/img/2016/12/021.jakis-tam-text2.png#medium) | ![jakis-tam-text3](/img/2016/12/022.jakis-tam-text3.png#medium) |
发生的事情是 | !\[.*\]
匹配最长的可能匹配项。也就是说,第一个短语,直到最后一个短语的开头。如果只想匹配第一个短语,则必须更加具体。例如:
sed 's@| !\[\]\(([^.]*\.\([^.]*\)\.[^)]*)\) |.*@| ![] |@'
我不完全理解这个问题,但是,你可以试试这个sed
$ sed 's#\([^[]*.\)\([^\.]*.\([^\.]*\)[^)]*.\)##' input_file
这将打印所有 3 个匹配项,但只会替换到第一个匹配项中
$ sed 's#\([^[]*.\)\([^\.]*.\([^\.]*\)[^)]*.\)##' input_file
| ![jakis-tam-text1](/img/2016/12/020.jakis-tam-text1.png#medium) | ![](/img/2016/12/021.jakis-tam-text2.png#medium) | ![](/img/2016/12/022.jakis-tam-text3.png#medium) |
要定位所有 3 个,可以添加 g
标志
sed 's#\([^[]*.\)\([^\.]*.\([^\.]*\)[^)]*.\)##g' input_file
| ![jakis-tam-text1](/img/2016/12/020.jakis-tam-text1.png#medium) | ![jakis-tam-text2](/img/2016/12/021.jakis-tam-text2.png#medium) | ![jakis-tam-text3](/img/2016/12/022.jakis-tam-text3.png#medium) |
例如,您也可以只定位 #2
$ sed 's#\([^[]*.\)\([^\.]*.\([^\.]*\)[^)]*.\)##2' input_file
| ![](/img/2016/12/020.jakis-tam-text1.png#medium) | ![jakis-tam-text2](/img/2016/12/021.jakis-tam-text2.png#medium) | ![](/img/2016/12/022.jakis-tam-text3.png#medium) |