在 HTML 块内的任何 HTML 标题中查找(替换)最后一个 space

Find (replace) the last space in any HTML headings within block of HTML

我正在尝试想出一些正则表达式,我可以用它来将最后一个 space 字符替换为 non-breaking space (控制寡妇) HTML.

到目前为止我有这个:

const regex = /(<h.>.+?)\s+((\S|<[^>]+>)*)\n|$/gi
const replaced = text.replace(regex, '&nbsp;')

在 regex101 中它看起来工作正常,但是当在 JavaScript 中使用 运行 时,它会在字符串末尾添加一个额外的 &nbsp

HTML 的示例块可能如下所示:

<h2>This is a test heading</h2>
<p>Here is some text</p>
<div>
  <h3>Here is a another heading</h3>
  <p>Some more paragraph text which shouldn't match</p>
</div>

应替换为:

<h2>This is a test&nbsp;heading</h2>
<p>Here is some text</p>
<div>
  <h3>Here is a another&nbsp;heading</h3>
  <p>Some more paragraph text which shouldn't match</p>
</div>

A link to regex101 显示工作模式。

下面是显示 JavaScript 中 non-working 行为的片段:

let text = "<h2>This is a test heading</h2>"
const regex = /(<h.>.+?)\s+((\S|<h.>)*)\n|$/gi
let replaced = text.replace(regex, '&nbsp;')
console.log(replaced);

text = `<h2>This is a test heading</h2>
<p>Here is some text</p>
<div>
  <h3>Here is a another heading</h3>
  <p>Some more paragraph text which shouldn't match</p>
  <p>Why is there a non breaking space at the very end?</p>
</div>`
replaced = text.replace(regex, '&nbsp;')
console.log(replaced);

您可以使用

var regex = /(<(h\d+)>[^<]*?)\s+([^\s<]*?<\/>)/gi;

替换为'&nbsp;'

详情

  • (<(h\d+)>[^<]*?) - 第 1 组 (</code>):<code><,然后 (h\d+) 捕获到第 2 组 a h 和 1+ 个数字,然后 > 被匹配,然后 < 以外的任何 0 个或更多字符,尽可能少
  • \s+ - 1+ 个空格
  • ([^\s<]*?<\/>) - 第 3 组 (</code>):除空格和 <code>< 以外的任何字符,然后是相应的结束标记:</,相同的值与第 2 组一样(</code> 是一个 in-pattern 反向引用)然后是 <code>>.

JS 演示:

var text = "<h2>This is a test heading</h2>\n<p>Here is some text</p>\n<div>\n  <h3>Here is a another heading</h3>\n  <p>Some more paragraph text which shouldn't match</p>\n</div>";
var regex = /(<(h\d+)>[^<]*?)\s+([^\s<]*?<\/>)/gi;
var replaced = text.replace(regex, '&nbsp;');
console.log(replaced);

在这里,我们将从一个简单的表达式开始,以捕获不需要的 space 以及其他可能出现在使用此捕获组的最后一个单词之前的 spaces (\s+):

<(h[1-6])>(.+)(\s+)([^\s]+)<\/>

如果我们想给我们的表达式添加更多的约束,我们当然可以这样做。

Demo

测试

const regex = /<(h[1-6])>(.+)(\s+)([^\s]+)<\/>/gim;
const str = `<h2>This is a test heading</h2>
<p>Here is some text</p>
<div>
  <h3>Here is a another heading</h3>
  <p>Some more paragraph text which shouldn't match</p>
</div>
<h2>This is a test   heading</h2>
<p>Here is some text</p>
<div>
  <h3>Here is a another    heading</h3>
  <p>Some more paragraph text which shouldn't match</p>
</div>`;
const subst = `<>&nbsp;<\/>`;

// The substituted value will be contained in the result variable
const result = str.replace(regex, subst);

console.log(result);

正则表达式

如果不需要此表达式并且您希望对其进行修改,请访问 link regex101.com

正则表达式电路

jex.im 可视化正则表达式:

处理标签属性的已接受答案的变体:

const regex = /<(h[1-6])(.*?)>(.+)(\s+)([^\s]+)<\/>/gim;
const subst = `<>&nbsp;<\/>`
const result = str.replace(regex, subst);

这允许在开始标签上有更大的灵活性。