sed 反向引用的奇怪行为

Weird behavior of sed's backreference

我们有以下一行文字:

| ![](/img/2016/12/020.jakis-tam-text1.png#medium) | ![](/img/2016/12/021.jakis-tam-text2.png#medium) | ![](/img/2016/12/022.jakis-tam-text3.png#medium) |

如您所见,该行文本仅由三个相似的短语组成,可以使用以下 sed 表达式(分别)匹配和更改它们:

sed -n 's@| !\[.*\](\(\/img\/\)\([0-9]*\/[0-9]*\/[0-9]*\)\.\(.*\)\.\(.*\)) |@| ![](..) |@p'

如果我们只有一个短语(而不是给定的三个),结果将如下所示:

$ echo '| ![](/img/2016/12/020.jakis-tam-text1.png#medium) |' | sed -n 's@| !\[.*\](\(\/img\/\)\([0-9]*\/[0-9]*\/[0-9]*\)\.\(.*\)\.\(.*\)) |@| ![](..) |@p'
| ![jakis-tam-text1](/img/2016/12/020.jakis-tam-text1.png#medium) |

但是当我们有两个或三个短语时,结果总是指向最后一个匹配的短语:

这是一个包含两个匹配项的示例:

$ echo '| ![](/img/2016/12/020.jakis-tam-text1.png#medium) | ![](/img/2016/12/021.jakis-tam-text2.png#medium) |' | sed -n 's@| !\[.*\](\(\/img\/\)\([0-9]*\/[0-9]*\/[0-9]*\)\.\(.*\)\.\(.*\)) |@| ![](..) |@p'
| ![jakis-tam-text2](/img/2016/12/021.jakis-tam-text2.png#medium) |

这是一个包含三个匹配项的示例:

$  echo '| ![](/img/2016/12/020.jakis-tam-text1.png#medium) | ![](/img/2016/12/021.jakis-tam-text2.png#medium) | ![](/img/2016/12/022.jakis-tam-text3.png#medium) |' | sed -n 's@| !\[.*\](\(\/img\/\)\([0-9]*\/[0-9]*\/[0-9]*\)\.\(.*\)\.\(.*\)) |@| ![](..) |@p'
| ![jakis-tam-text3](/img/2016/12/022.jakis-tam-text3.png#medium) |

为什么会这样?

有没有办法强制 sed 只打印第一场比赛的结果?

预期的行为?我虽然以下命令会打印类似于此的内容(只是第一个匹配项):

$  echo '| ![](/img/2016/12/020.jakis-tam-text1.png#medium) | ![](/img/2016/12/021.jakis-tam-text2.png#medium) | ![](/img/2016/12/022.jakis-tam-text3.png#medium) |' | sed -n 's@| !\[.*\](\(\/img\/\)\([0-9]*\/[0-9]*\/[0-9]*\)\.\(.*\)\.\(.*\)) |@| ![](..) |@p'
    | ![jakis-tam-text1](/img/2016/12/020.jakis-tam-text1.png#medium) |

或这个(所有匹配项):

$  echo '| ![](/img/2016/12/020.jakis-tam-text1.png#medium) | ![](/img/2016/12/021.jakis-tam-text2.png#medium) | ![](/img/2016/12/022.jakis-tam-text3.png#medium) |' | sed -n 's@| !\[.*\](\(\/img\/\)\([0-9]*\/[0-9]*\/[0-9]*\)\.\(.*\)\.\(.*\)) |@| ![](..) |@p'
    | ![jakis-tam-text1](/img/2016/12/020.jakis-tam-text1.png#medium) | ![jakis-tam-text2](/img/2016/12/021.jakis-tam-text2.png#medium) | ![jakis-tam-text3](/img/2016/12/022.jakis-tam-text3.png#medium) |

发生的事情是 | !\[.*\] 匹配最长的可能匹配项。也就是说,第一个短语,直到最后一个短语的开头。如果只想匹配第一个短语,则必须更加具体。例如:

sed 's@| !\[\]\(([^.]*\.\([^.]*\)\.[^)]*)\) |.*@| ![] |@'

我不完全理解这个问题,但是,你可以试试这个sed

$ sed 's#\([^[]*.\)\([^\.]*.\([^\.]*\)[^)]*.\)##' input_file

这将打印所有 3 个匹配项,但只会替换到第一个匹配项中

$ sed 's#\([^[]*.\)\([^\.]*.\([^\.]*\)[^)]*.\)##' input_file
| ![jakis-tam-text1](/img/2016/12/020.jakis-tam-text1.png#medium) | ![](/img/2016/12/021.jakis-tam-text2.png#medium) | ![](/img/2016/12/022.jakis-tam-text3.png#medium) |

要定位所有 3 个,可以添加 g 标志

sed 's#\([^[]*.\)\([^\.]*.\([^\.]*\)[^)]*.\)##g' input_file
| ![jakis-tam-text1](/img/2016/12/020.jakis-tam-text1.png#medium) | ![jakis-tam-text2](/img/2016/12/021.jakis-tam-text2.png#medium) | ![jakis-tam-text3](/img/2016/12/022.jakis-tam-text3.png#medium) |

例如,您也可以只定位 #2

$ sed 's#\([^[]*.\)\([^\.]*.\([^\.]*\)[^)]*.\)##2' input_file
| ![](/img/2016/12/020.jakis-tam-text1.png#medium) | ![jakis-tam-text2](/img/2016/12/021.jakis-tam-text2.png#medium) | ![](/img/2016/12/022.jakis-tam-text3.png#medium) |