为什么在以下情况下使用 strip_tags 函数后 HTML 标签没有被删除?

Why the HTML tags are not getting removed even after using strip_tags function in following scenario?

我有一个名为 $aMessages 的数组。实际上它是一个相当大的数组,但为了您的参考,我只在下面打印它的前三个元素:

Array
(
    [0] => Array
        (
            [message_id] => 240
            [thread_id] => 43
            [user_id] => 244
            [text] => test msg<div class="mail_attach_image"><a class="group1" href="http://52.1.47.143/file/attachment/2015/04/49c79e88b24a8fff8104909fce19aa3f.png" ><img src="http://52.1.47.143/file/attachment/2015/04/49c79e88b24a8fff8104909fce19aa3f.png"  /></a><br><a class="mail_attach_image_link_dwl"  href="http://52.1.47.143/feed/download/year_ 2015/month_04/file_49c79e88b24a8fff8104909fce19aa3f.png" >Download</a></div>
            [time_stamp] => 1429695832
            [total_attachment] => 0
            [is_mobile] => 0
            [has_forward] => 0
            [profile_page_id] => 0
            [user_server_id] => 0
            [user_name] => profile-244
            [full_name] => CampusKnot .
            [gender] => 1
            [user_image] => 2015/03/ae6f1665efc29eb3360d392bbcd183b7%s.jpg
            [is_invisible] => 0
            [user_group_id] => 7
            [language_id] => �
            [forwards] => Array
                (
                )

        )

    [1] => Array
        (
            [message_id] => 241
            [thread_id] => 43
            [user_id] => 901
            [text] => hi
            [time_stamp] => 1429695875
            [total_attachment] => 0
            [is_mobile] => 0
            [has_forward] => 0
            [profile_page_id] => 0
            [user_server_id] => 1
            [user_name] => profile-901
            [full_name] => Student Campusknot
            [gender] => 2
            [user_image] => 2014/11/b23e023750785c8b5e61ace4d6a202fa%s.png
            [is_invisible] => 0
            [user_group_id] => 6
            [language_id] => �
            [forwards] => Array
                (
                )

        )

    [2] => Array
        (
            [message_id] => 243
            [thread_id] => 43
            [user_id] => 244
            [text] => textmessage
            [time_stamp] => 1429710052
            [total_attachment] => 0
            [is_mobile] => 0
            [has_forward] => 0
            [profile_page_id] => 0
            [user_server_id] => 0
            [user_name] => profile-244
            [full_name] => CampusKnot .
            [gender] => 1
            [user_image] => 2015/03/ae6f1665efc29eb3360d392bbcd183b7%s.jpg
            [is_invisible] => 0
            [user_group_id] => 7
            [language_id] => �
            [forwards] => Array
                (
                )

        )
)

如果您仔细观察第一个元素的 ['text'] 键,则存在一些 HTML 代码。我想删除这个 HTML 代码并保留文本值(在这种情况下,值“test msg”应该只保留在那里,所有其他 HTML 代码应该被删除)。

所以基本上我想要的是检查每个元素的 ['text'] 键值是否存在 HTML 代码。

如果存在 HTML 代码,则应将其删除,只保留纯文本。

为此,我尝试了以下代码,但没有任何改变:

foreach($aMessages as $key => $value) {
  $value['text'] = strip_tags($value['text']);
}

有人可以在这方面帮助我吗?

提前致谢。

尝试:

foreach($aMessages as $key => $value) {
    $aMessages[$key]['text'] = strip_tags($value['text']);
}

foreach 创建数组的副本。您的 $value 不会在原始数组中更改。要么更改原始数组中的值,要么通过引用分配 $value

引用 http://php.net/manual/en/control-structures.foreach.php

In order to be able to directly modify array elements within the loop precede $value with &. In that case the value will be assigned by reference.

另见 How does PHP 'foreach' actually work?

关于您的评论

My issue is the string between HTML anchor tags is not getting ignored.

是的,strip_tags 会如其名。它剥离标签。但不是他们的内容。

or simply cut off everything after the first <。第一种方法需要更多代码。后者不太可靠,因为文本可能包含不是标记的 小于

可靠性和代码量之间的一个很好的权衡是 compare the original string against the stripped string 然后只有 return 从开始到第一个不同字符的子字符串,例如

$text = substr($string, 0, strspn($string ^ strip_tags($string), "[=10=]"));

请注意,这些方法中的 none 考虑到标签外可能有文本,例如textMsg<b>foo</b>bar<i>baz</i>end 只会产生 "textMsg"。如果你想要 "textMsg bar end" 使用 DOM 像这样:

$string = 'textMsg<b>foo</b>bar<i>baz</i>end';

libxml_use_internal_errors(true);
$dom = new DOMDocument;
$dom->loadHTML('<div id="root">' . $string . '</div>');
$xpath = new DOMXPath($dom);
$combinedDirectTextNodes = [];
foreach ($xpath->evaluate('id("root")/text()') as $text) {
    $combinedDirectTextNodes[] = $text->nodeValue;
};
libxml_use_internal_errors(false);

echo implode(' ', $combinedDirectTextNodes); // textMsg bar end

如果我没记错的话,这对你有用,如果你关心的只是获得 test msg 其他使用 strip_tags 它会去除所有标签,但其余的数据会在那里

foreach($aMessages as $key => &$value){
    $aMessages[$key]['text'] = substr($value['text'],0,strpos($value['text'],'<'));
}
print_r($aMessages); //Array ( [0] => Array ( [message_id] => 240 [thread_id] => 43 [user_id] => 244 [text] => test msg [time_stamp] => 1429695832 [total_attachment] => 0 [is_mobile] => 0 ) [1] => Array ( [message_id] => 241 [thread_id] => 43 [user_id] => 901 [text] => [time_stamp] => 1429695875 [total_attachment] => 0 [is_mobile] => 0 ) [2] => Array ( [message_id] => 243 [thread_id] => 43 [user_id] => 244 [text] => [time_stamp] => 1429710052 [total_attachment] => 0 [is_mobile] => 0 ) )
  1. 您应该在 foreach 循环中修改原始数组而不是克隆数组。
  2. 您不能使用 strip_tags() 删除标签和标签内的内容。 strip_tags() 只删除标签。

试试这个:

function strip_tags_content($text, $tags = '', $invert = FALSE) { 

  preg_match_all('/<(.+?)[\s]*\/?[\s]*>/si', trim($tags), $tags); 
  $tags = array_unique($tags[1]); 

  if(is_array($tags) AND count($tags) > 0) { 
    if($invert == FALSE) { 
      return preg_replace('@<(?!(?:'. implode('|', $tags) .')\b)(\w+)\b.*.*?</>@si', '', $text); 
    } 
    else { 
      return preg_replace('@<('. implode('|', $tags) .')\b.*.*?</>@si', '', $text); 
    } 
  } 
  elseif($invert == FALSE) { 
    return preg_replace('@<(\w+)\b.*.*?</>@si', '', $text); 
  } 
  return $text; 
}

foreach($aMessages as $key => $value) {
  $value['text'] = strip_tags_content($value['text']);
}