在 HTML 块内的任何 HTML 标题中查找（替换）最后一个 space

Question

我正在尝试想出一些正则表达式，我可以用它来将最后一个 space 字符替换为 non-breaking space （控制寡妇） HTML.

块

到目前为止我有这个：

const regex = /(<h.>.+?)\s+((\S|<[^>]+>)*)\n|$/gi
const replaced = text.replace(regex, '&nbsp;')

在 regex101 中它看起来工作正常，但是当在 JavaScript 中使用运行时，它会在字符串末尾添加一个额外的 &nbsp。

HTML 的示例块可能如下所示：

<h2>This is a test heading</h2>
<p>Here is some text</p>
<div>
  <h3>Here is a another heading</h3>
  <p>Some more paragraph text which shouldn't match</p>
</div>

应替换为：

<h2>This is a test&nbsp;heading</h2>
<p>Here is some text</p>
<div>
  <h3>Here is a another&nbsp;heading</h3>
  <p>Some more paragraph text which shouldn't match</p>
</div>

A link to regex101 显示工作模式。

下面是显示 JavaScript 中 non-working 行为的片段：

let text = "<h2>This is a test heading</h2>"
const regex = /(<h.>.+?)\s+((\S|<h.>)*)\n|$/gi
let replaced = text.replace(regex, '&nbsp;')
console.log(replaced);

text = `<h2>This is a test heading</h2>
<p>Here is some text</p>
<div>
  <h3>Here is a another heading</h3>
  <p>Some more paragraph text which shouldn't match</p>
  <p>Why is there a non breaking space at the very end?</p>
</div>`
replaced = text.replace(regex, '&nbsp;')
console.log(replaced);

Answer 1

您可以使用

var regex = /(<(h\d+)>[^<]*?)\s+([^\s<]*?<\/>)/gi;

替换为' '。

详情

(<(h\d+)>[^<]*?) - 第 1 组 (</code>)：<code><，然后 (h\d+) 捕获到第 2 组 a h 和 1+ 个数字，然后 > 被匹配，然后 < 以外的任何 0 个或更多字符，尽可能少
\s+ - 1+ 个空格
([^\s<]*?<\/>) - 第 3 组 (</code>)：除空格和 <code>< 以外的任何字符，然后是相应的结束标记：</，相同的值与第 2 组一样（</code> 是一个 in-pattern 反向引用）然后是 <code>>.

JS 演示：

var text = "<h2>This is a test heading</h2>\n<p>Here is some text</p>\n<div>\n  <h3>Here is a another heading</h3>\n  <p>Some more paragraph text which shouldn't match</p>\n</div>";
var regex = /(<(h\d+)>[^<]*?)\s+([^\s<]*?<\/>)/gi;
var replaced = text.replace(regex, '&nbsp;');
console.log(replaced);

Answer 2

在这里，我们将从一个简单的表达式开始，以捕获不需要的 space 以及其他可能出现在使用此捕获组的最后一个单词之前的 spaces (\s+):

<(h[1-6])>(.+)(\s+)([^\s]+)<\/>

如果我们想给我们的表达式添加更多的约束，我们当然可以这样做。

Demo

测试

const regex = /<(h[1-6])>(.+)(\s+)([^\s]+)<\/>/gim;
const str = `<h2>This is a test heading</h2>
<p>Here is some text</p>
<div>
  <h3>Here is a another heading</h3>
  <p>Some more paragraph text which shouldn't match</p>
</div>
<h2>This is a test   heading</h2>
<p>Here is some text</p>
<div>
  <h3>Here is a another    heading</h3>
  <p>Some more paragraph text which shouldn't match</p>
</div>`;
const subst = `<>&nbsp;<\/>`;

// The substituted value will be contained in the result variable
const result = str.replace(regex, subst);

console.log(result);

正则表达式

如果不需要此表达式并且您希望对其进行修改，请访问 link regex101.com。

正则表达式电路

jex.im 可视化正则表达式：

Answer 3

处理标签属性的已接受答案的变体：

const regex = /<(h[1-6])(.*?)>(.+)(\s+)([^\s]+)<\/>/gim;
const subst = `<>&nbsp;<\/>`
const result = str.replace(regex, subst);

这允许在开始标签上有更大的灵活性。

在 HTML 块内的任何 HTML 标题中查找（替换）最后一个 space

Find (replace) the last space in any HTML headings within block of HTML

javascript

regex

regex-group

regex-greedy

regex-lookarounds

Demo

测试

正则表达式

正则表达式电路