删除没有子元素的元素 DOM PHP
Removing elements with no children DOM PHP
我想从字符串中删除所有空的 <a>
标签。
所以:
<a href="http://www.google.com"></a>
而不是:
<a href="http://www.google.com">Not empty</a>
但是:
<a href="http://www.google.com"><img src="puppy.jpg" alt="Not empty"></a>
也将被删除。
编辑:
基本上图像正在被删除,因为它们似乎有一个空的节点值。我想保留图像。当 <a>
标签之间有图像时,为什么 nodeValue 返回空值?
这是我的尝试:
<?php
$content_before='
so:
<a href="http://www.google.com"></a>
and not:
<a href="http://www.google.com">Not empty</a>
However:
<a href="http://www.google.com"><img src="puppy.jpg" alt="Not empty"></a>
';
$dom=new domDocument;
@$dom->loadHTML($content_before);
$dom->preserveWhiteSpace = true;
$anchors=$dom->getElementsByTagName('a');
foreach($anchors as $a)
{
$as[] = $a;
}
foreach($as as $a)
{
$nodevalue=$a->nodeValue;
$nodevalue=trim($nodevalue);
if(empty($nodevalue)&&is_object($a))
{
#remove links without nodevalues
$a->parentNode->removeChild($a);
}
}
$content=$dom->saveHTML();
echo 'before:<br><textarea>'.$content_before.'</textarea>';
echo 'after<br><textarea>'.$content.'</textarea>';
#what $content becomes:
$content='
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body><p>so:
and not:
<a href="http://www.google.com">Not empty</a>
However:
</p></body></html>';
#What I want it to be:
$content_after='
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body><p>so:
and not:
<a href="http://www.google.com">Not empty</a>
However:
<a href="http://www.google.com"><img src="puppy.jpg" alt="Not empty"></a>
</p></body></html>';
?>
另一种方法是使用 xpath
查询,然后获取所有具有 no/empty 个子元素的元素。之后,使用回归删除所有这些元素:
$dom = new DomDocument;
@$dom->loadHTML($content_before);
$dom->preserveWhiteSpace = true;
$xpath = new DOMXpath($dom);
$empty_anchors = $xpath->evaluate('//a[not(*) and not(text()[normalize-space()])]');
$i = $empty_anchors->length - 1;
while ($i > -1) {
$element = $empty_anchors->item($i);
$element->parentNode->removeChild($element);
$i--;
}
echo $dom->saveHTML();
您可以检查 firstChild
是否存在,只需将 foreach
循环更改为:
foreach($as as $a)
{
if($a->firstChild === NULL && is_object($a))
{
#remove links without nodevalues
$a->parentNode->removeChild($a);
}
}
firstChild
The first child of this node. If there is no such node, this returns NULL
我想从字符串中删除所有空的 <a>
标签。
所以:
<a href="http://www.google.com"></a>
而不是:
<a href="http://www.google.com">Not empty</a>
但是:
<a href="http://www.google.com"><img src="puppy.jpg" alt="Not empty"></a>
也将被删除。
编辑:
基本上图像正在被删除,因为它们似乎有一个空的节点值。我想保留图像。当 <a>
标签之间有图像时,为什么 nodeValue 返回空值?
这是我的尝试:
<?php
$content_before='
so:
<a href="http://www.google.com"></a>
and not:
<a href="http://www.google.com">Not empty</a>
However:
<a href="http://www.google.com"><img src="puppy.jpg" alt="Not empty"></a>
';
$dom=new domDocument;
@$dom->loadHTML($content_before);
$dom->preserveWhiteSpace = true;
$anchors=$dom->getElementsByTagName('a');
foreach($anchors as $a)
{
$as[] = $a;
}
foreach($as as $a)
{
$nodevalue=$a->nodeValue;
$nodevalue=trim($nodevalue);
if(empty($nodevalue)&&is_object($a))
{
#remove links without nodevalues
$a->parentNode->removeChild($a);
}
}
$content=$dom->saveHTML();
echo 'before:<br><textarea>'.$content_before.'</textarea>';
echo 'after<br><textarea>'.$content.'</textarea>';
#what $content becomes:
$content='
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body><p>so:
and not:
<a href="http://www.google.com">Not empty</a>
However:
</p></body></html>';
#What I want it to be:
$content_after='
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body><p>so:
and not:
<a href="http://www.google.com">Not empty</a>
However:
<a href="http://www.google.com"><img src="puppy.jpg" alt="Not empty"></a>
</p></body></html>';
?>
另一种方法是使用 xpath
查询,然后获取所有具有 no/empty 个子元素的元素。之后,使用回归删除所有这些元素:
$dom = new DomDocument;
@$dom->loadHTML($content_before);
$dom->preserveWhiteSpace = true;
$xpath = new DOMXpath($dom);
$empty_anchors = $xpath->evaluate('//a[not(*) and not(text()[normalize-space()])]');
$i = $empty_anchors->length - 1;
while ($i > -1) {
$element = $empty_anchors->item($i);
$element->parentNode->removeChild($element);
$i--;
}
echo $dom->saveHTML();
您可以检查 firstChild
是否存在,只需将 foreach
循环更改为:
foreach($as as $a)
{
if($a->firstChild === NULL && is_object($a))
{
#remove links without nodevalues
$a->parentNode->removeChild($a);
}
}
firstChild
The first child of this node. If there is no such node, this returns NULL