如何从字符串中删除所有 html 标签,包括 ' '?

How to remove all html tags including ' ' from string?

我在我的一个模块中使用了 CKEDITOR。它使用 HTML 标签存储数据,如下所示:

<p>Lorem Ipsum&amp;nbsp;is simply dummy text of the printing and 
typesetting industry.Lorem Ipsum has 
been the industry&amp;#39;s standard 
dummy text ever since the 1500s, when 
an unknown printer took a galley of 
type and scrambled it to make a type 
specimen book. It has survived not 
only five centuries, but also the leap 
into electronic typesetting,remaining 
essentially unchanged. It was 
popularised in the 1960s with the 
release of Letraset sheets containing 
Lorem Ipsum passages, and more 
recently with desktop publishing 
software like Aldus PageMaker 
including versions of Lorem Ipsum.
</p>\n\n<p>&nbsp;
</p>\n\n<p>TItle&nbsp;</p>\n

我尝试使用此正则表达式转换为纯文本:

str.replace(/(<([^>]+)>)/ig ,'');

但是我没有得到预期的输出。

我想要这个输出:

'Lorem Ipsum & is simply dummy text of the printing and typeseting industry.Lorem Ipsum 已被行业 &'s standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting,remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.TItle.'

注意: 此正则表达式删除除 "\n ,  " 之外的所有 html 标签。所以请帮帮我...如何从字符串中也删除“\n, ”?

文本看起来是double-escaped,有点——先把所有的&amp;都变成&,这样HTML实体才能被正确识别。然后 .text() 将为您提供 HTML 标记的纯文本版本。

const input = `<p>Lorem Ipsum&amp;nbsp;is simply dummy text of the printing and typesetting industry.Lorem Ipsum has been the industry&amp;#39;s standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting,remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.</p>\n\n<p>&nbsp;</p>\n\n<p>TItle&nbsp;</p>\n`;
const inputWithProperEntities = input.replaceAll('&amp;', '&');
console.log($(inputWithProperEntities).text());
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>

\n 不是 HTML 标签,而是换行符的表示。如果你也想删除所有换行符,那么:

const input = `<p>Lorem Ipsum&amp;nbsp;is simply dummy text of the printing and typesetting industry.Lorem Ipsum has been the industry&amp;#39;s standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting,remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.</p>\n\n<p>&nbsp;</p>\n\n<p>TItle&nbsp;</p>\n`;
const inputWithProperEntities = input.replaceAll('&amp;', '&');
console.log($(inputWithProperEntities).text().replaceAll('\n', ''));
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>