正则表达式替换文本中的连字符,不包括 url、标签和邮件

Regex replace hyphens in text excluding urls, tags and mails

我正在尝试将文本中的连字符替换为不间断的连字符,但我需要排除所有 URL、电子邮件和标签。这是我正在尝试编辑的一些文本:

Some text with a link but also plain URL like http://another-domain.com and an e-mail info@some-domain.com and e-shop and some relative URL like /test-url/on-this-website.

我想出了这个正则表达式:(^|\s+)[^@|^\/]+(\s+|$)

但它不能用于 preg_replace,它不匹配连字符,而是匹配包含破折号的整个文本。

结果应该是:

Some text with a <a href="https://some-domain.com/section-name" class="some-class">link</a> but also plain URL like http://another-domain.com and an e&#8209;mail info@some-domain.com and e&#8209;shop and some relative URL like /test-url/on-this-website.

有人做过类似的事情吗?

您的正则表达式存在一些问题...

  • 您不能在角色中使用 | 作为 OR 运算符 class
  • 你的正则表达式很贪婪
  • 您不能在一个字符中使用多个 not 运算符 class
  • 你不需要在开始和结束时匹配超过一个 space
  • 你的角色class吞下spaces

我觉得你想多了;您可以将您的任务改写为:"replace hyphens in words"

(\s\w+)-(\w+\s)
(\s\w+)            : Capture group matching a white space and then 1 or more of the characters [a-zA-Z0-9_]
        -          : Match a hyphen
         (\w+\s)   : Capture group matching a white space and then 1 or more of the characters [a-zA-Z0-9_]

但是,您也可以使用范围更广的角色 class,例如:

(\s[^@\/\s]+)-([^@\/\s]+\s)
(\s[^@\/\s]+)                : Capture group matching a space followed by 1 or more characters which aren't  @, /, or a space
             -               : Matches a hyphen
              ([^@\/\s]+\s)  : Capture group matching a space followed by 1 or more characters which aren't  @, /, or a space

$string = "Some text with a link but also plain URL like http://another-domain.com and an e-mail info@some-domain.com and e-shop and some relative URL like /test-url/on-this-website.";

echo preg_replace("/(\s\w+)-(\w+\s)/", "&#8209;", $string);

echo preg_replace("/(\s[^@\/\s]+)-([^@\/\s]+\s)/", "&#8209;", $string);

注意:您可能需要更改开始和结束 space 以包含字符串的 start/end。