删除包含新行的文本周围的 html 标记

Question

我在 .html 文件中有如下标签：

<td>
<P CLASS="abc">
hello</P>
</td>

我想删除文本周围的  标签，并删除第一个  标签后的换行符，这样我只得到以下文本。

<td>
hello
</td>

这些文件在 Linux 服务器上，所以如果有人对此有任何 Linux 方法，我很感兴趣。我也可以使用 Notepad++ 访问这些文件，它允许 Find/Replace 的正则表达式。

Answer 1

试试这个： 通过正则表达式 <p[^>]*>[\n\t\r]*|<\/p[^>]*>

在 notepad++ 中替换 <p[^>]*>[\n\t\r]*|<\/p[^>]*> 并替换为 </code>（空白） <a href="https://regex101.com/r/vU9vZ1/2" rel="nofollow">Live demo</a> 下一个要求的更新： <code>(<td>[\s\S]*?)<P[^>]*>[\n\t\r]*([^>]+)<\/P> 替换为

Live demo

删除包含新行的文本周围的 html 标记

Remove html tags around text which includes a new line

html

regex

replace