如何删除 HTML 周围的封闭标签？

Question

我正在使用 customfilter 模块使用 Drupal 的 asciidoc 语法为文本创建自定义过滤器。我将它包含在 [asciidoc][/asciidoc] 标签中，当我通过 asciidoctor 命令运行它时，输出包含在 <div class="paragraph"><p> 标签中。

我使用 [asciidoc] 标签格式化 html 链接的结果是这样的。

On the markup side Drupal's contrib `markdown` filter has been somewhat iffy,
and so has the `bbcode` filter. Looking around for other more compact documenting
systems led me to the https://asciidoc.org[Asciidoc] utility and its more
advanced brother https://asciidoctor.org[Asciidoctor]. In combination with another
 Drupal module called https://drupal.org/project/customfilter[customfilter] which
makes it easy to create your own filters, I think I have hit on a combination
of modules which allow me as much freedom and fine control on my pages as I want.

<div class="paragraph">
<p>On the markup side Drupal&#8217;s contrib <code>markdown</code> filter has been somewhat iffy,
and so has the <code>bbcode</code> filter. Looking around for other more compact documenting
systems led me to the <a href="https://asciidoc.org">Asciidoc</a> utility and its more
advanced brother <a href="https://asciidoctor.org">Asciidoctor</a>. In combination with another
 Drupal module called <a href="https://drupal.org/project/customfilter">customfilter</a> which
makes it easy to create your own filters, I think I have hit on a combination
of modules which allow me as much freedom and fine control on my pages as I want.</p>
</div>

是否有一些 PHP 函数可以将字符串 HTML 和封闭标签集作为字符串，并且 return 它们包含的内部 HTML ？或者也许是一些可以匹配标签之间部分的正则表达式？

这是期望的输出

On the markup side Drupal&#8217;s contrib <code>markdown</code> filter has been somewhat iffy,
and so has the <code>bbcode</code> filter. Looking around for other more compact documenting
systems led me to the <a href="https://asciidoc.org">Asciidoc</a> utility and its more
advanced brother <a href="https://asciidoctor.org">Asciidoctor</a>. In combination with another
 Drupal module called <a href="https://drupal.org/project/customfilter">customfilter</a> which
makes it easy to create your own filters, I think I have hit on a combination
of modules which allow me as much freedom and fine control on my pages as I want.

我问了一个相关问题是否可以配置 asciidoc 以避免将输出包含在 <div class="paragraph"><p>...</p></div> - Does asciidoctor have a setting to remove the <paragraph> and <p> tags from the source it outputs?

中

Answer 1

通过纯 PHP，您可以使用 DOMDocument，我不推荐使用它，因为它很慢，而且您会在跟踪其错误等方面遇到麻烦。出于同样的原因，我不会解释更多有关该对象的信息。只是来自官网的link：

PHP DomDocument

注意：我个人更喜欢在处理大文本时使用 DomDocument 例如，我过去常常阅读整个页面并一个一个地获取所有元素，这对于正则表达式来说几乎是不可能的。在那种情况下，我使用了 DomDocument.

让我们回到你的话题。你的例子表明你没有解析大块所以我建议使用 Regex.

preg_match_all( '/<p>(?P<content>.*?)<\/p>/s' ,$text, $ref );
var_dump($ref['content']);

以上正则表达式为您提供了 p 标签中的所有元素。

你可以玩玩它并像这样制作一个新的：

preg_match_all( '/<div class="paragraph">\s<p>(?P<content>.*?)<\/*p>\s<\/*div>/' ,$text, $ref );

它为您提供 div 标签之间的所有内容（标签可能具有任何属性）。

另请参阅下面关于正则表达式

的link

Regex Tutorial

祝你好运

如何删除 HTML 周围的封闭标签？

How can I remove the enclosing tags around a piece of HTML?

php

regex

html-content-extraction