Trim 无法使用来自 MySQL 获取的字符串的数组

Trim Not Working with Array from MySQL fetched String

我想做的是取一块 html,去掉所有 html 标签,然后将每行文本放入一个 PHP 数组。

我只是用一个块来测试它(因此 mysql 查询中的 WHERE ID = '2409'

ID 2409 的 HTML 部分如下所示:

<table class="description-table">
<tbody>
<tr><td>Saepe Encomia 2.aD NEC Mirum Populo Soluni Iis 8679-1370 Status Error Sed 9.9</td></tr>
<tr><td>Description</td></tr>
<tr><td></td>
<td><br>
<br><p></p><p></p>
<strong><br></strong> <strong><br></strong> <strong>Donec Rem </strong><br>
<br>
<strong>Animam Urgebat<br>
<br></strong> <strong><br>
<br>
Rerum Sed 8613 - 3669 8358 & 6699<br>
<br>
1.mE (magNA) QUO Ad Nominum Statum Massa<br>
ab SEM Autem Reddet Habitu Sit<br>
<br></strong> <strong> PRAEDAM ACCUMSAN PERSONARUM DENEGARE AC DUORUM</strong> <strong><br></strong> <strong><br></strong> <strong>Lius typi sit nec quo adversis cras ministri oppressa, versus class hic rem quos colubros ullo commune!economy!</strong><strong><br></strong><strong>                                                           ad Quisque Modeste</strong><strong>                                                           ac Rem Wisi</strong><strong>                                                           ex Hac Congue mus Leo</strong><strong>                                                           ab 7/92" Alias</strong><strong>                                                           ad 2/73" Adverso & Erat</strong><strong>                                                           me Personom Eget</strong><strong>                                                           ad Viribus Fuga Fuga</strong><strong>                                                           ab Louor-Sit Molles</strong><strong class="c2">                                                           3x Block-Off Plates</strong><strong class="c2">                                                           ad Facunda</strong><strong class="c2">                                                           ab Personas Diam<br>
NUNC<br>
ex Teniet te Palmam Eaque<br>
me Teniet in Versus Urna<br></strong> <strong><br></strong><br>
<strong class="c3">**CONDEMNENDUS REM CUM MAGNORUM**</strong><strong></strong><br>
</td>
</table>

这是我的 PHP 脚本,旨在解析此

//connect to mysqli

$results = $mysqli->query("SELECT ID, post_content
FROM wp_posts'
WHERE ID = '2409';");

while($row = $results->fetch_array()) {
    $htmlarray2 = preg_split('/<.+?>/', $row['post_content']);
    $htmlarray = array_values(array_filter(array_map('trim', $htmlarray2)));
    echo '<pre>';
        print_r($htmlarray);
    echo '</pre>';
    . . . 
}

这会产生这样的输出

Array
(
[0] => Saepe Encomia 2.aD NEC Mirum Populo Soluni Iis 8679-1370 Status Error Sed 9.9
[1] => Donec Rem 
[2] => Animam Urgebat
[3] => Rerum Sed 8613 - 3669 8358 & 6699
[4] => 1.mE (magNA) QUO Ad Nominum Statum Massa
[5] => ab SEM Autem Reddet Habitu Sit
[6] =>  PRAEDAM ACCUMSAN PERSONARUM DENEGARE AC DUORUM
[7] => Lius typi sit nec quo adversis cras ministri oppressa, versus class hic rem quos colubros ullo commune!
[8] =>                                                            ad Quisque Modeste
[9] =>                                                            ac Rem Wisi
[10] =>                                                            ex Hac Congue mus Leo
[11] =>                                                            ab 7/92" Alias
[12] =>                                                            ad 2/73" Adverso & Erat
[13] =>                                                            me Personom Eget
[14] =>                                                            ad Viribus Fuga Fuga
[15] =>                                                            ea Totam Poenam
[16] =>                                                            ab Louor-Sit Molles
[17] =>                                                            ad Facunda
[18] =>                                                            ab Personas Diam
[19] => NUNC
[20] => ex Teniet te Palmam Eaque
[21] => me Teniet in Versus Urna
[22] => **CONDEMNENDUS REM CUM MAGNORUM**
)

没关系,但现在我在删除数组中字符串前后的空格时遇到问题。

我们以数组8中的节点

为例
. . .
$arrayvalue = $htmlarray2['8'];

这样的回声

                                                       ad Quisque Modeste

现在,我要做的显然是 trim 数组的每个元素,但为了测试,我只使用这个变量 $arrayvalue

我的问题是 trim() 无法使用这个 MySQL 获取的变量。意思是添加 trim($arrayvalue); 没有影响,并以与上面相同的方式回显。

我知道这与我通过查询获取数组有关,因为如果我只是在它自己的 php 脚本

中正常测试这个变量
$string = '                                                            ad Quisque Modeste  ';
echo trim($string);

它工作正常,echo 输出只是简单地 ad Quisque Modeste,字符串前后没有所需的空格。

为什么 trim() 在我的 while 循环中不起作用? trim从元素中提取前导和尾随空格的技巧是什么?

编辑:这是我按要求完成的完整 while 循环。它与上面的例子有点不同(我一直在做很多修改试图自己解决这个问题所以它不断变化),但这是我现在的完整内容:

while($row = $results->fetch_array()) {
    $id = $row['ID'];
    echo 'ID: ' . $id;
    echo '<br  />';

    //replace &nbsp; with white space
    $converted = strtr($row['post_content'],array_flip(get_html_translation_table(HTML_ENTITIES, ENT_QUOTES))); 
    trim($converted, chr(0xC2).chr(0xA0));

    //remove html elements
    $htmlarray = preg_split('/<.+?>/', $converted);

    // remove empty array elements and re-index array
    $htmlarray2 = array_values(array_filter(array_map('trim', $htmlarray)));

    // test by getting single value from array
    $arrayvalue = $htmlarray2['9'];

    // my attempt to trim string in while loop
    trim($arrayvalue);

    // doesn't trim
    echo '<hr>' . $arrayvalue . '<hr>';

    // put this here so I can see the full array
    echo '<pre>';
        print_r($htmlarray2);
    echo '</pre>';
}

根据要求,这里是var_export($row['post_content']);

的结果
'<table class="product-description-table">
<tbody>
<tr>
<td class="item" colspan="3">Saepe Encomia 2.aD NEC Mirum Populo Soluni Iis 8679-1370 Status Error Sed 9.9</td>
</tr>
<tr>
<td class="title" colspan="3"></td>
</tr>
<tr>
<td class="content"><br>
<br>
<p class="c1"></p>
<p class="c1"></p>
<strong><br></strong> <strong><br></strong> <strong>Donec Rem&nbsp;</strong><br>
<br>
<strong>Animam Urgebat<br>
<br></strong> <strong><br>
<br>
Rerum Sed 8613 - 3669 8358 & 6699<br>
<br>
1.mE (magNA) QUO Ad Nominum Statum Massa<br>
ab SEM Autem Reddet Habitu Sit<br>
<br></strong> <strong>&nbsp;PRAEDAM ACCUMSAN PERSONARUM DENEGARE AC DUORUM</strong> <strong><br></strong> <strong><br></strong> <strong>Lius typi sit nec quo adversis cras ministri oppressa, versus class hic rem quos colubros ullo commune!economy!</strong><strong><br></strong><strong>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;ad Quisque Modeste</strong><strong>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;ac Rem Wisi</strong><strong>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;ex Hac Congue mus Leo</strong><strong>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;ab 7/92" Alias</strong><strong>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;ad 2/73" Adverso & Erat</strong><strong>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;me Personom Eget</strong><strong>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;ad Viribus Fuga Fuga</strong><strong>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;ab Louor-Sit Molles</strong><strong class="c2">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;3x Block-Off Plates</strong><strong class="c2">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;ad Facunda</strong><strong class="c2">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;ab Personas Diam<br>
NUNC<br>
ex Teniet te Palmam Eaque<br>
me Teniet in Versus Urna<br></strong> <strong><br></strong><br>
<strong class="c3">**CONDEMNENDUS REM CUM MAGNORUM**</strong><strong>&nbsp;</strong><br></td>
<td class="product-content-border"></td>
</tr>
<tr>
<td class="gallery" colspan="3">
<table>
<tbody>
<tr>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
</tr>
</tbody>
</table>
</td>
</tr>
<tr>
<td></td>
</tr>
<tr>
<td class="spacer" colspan="3"></td>
</tr>
<tr>
<td class="product-content-border"></td>
</tr>
</tbody>
</table>
<br>
<br>
<br>
<p class="c4"></p>'

最终编辑:):

在下面发布了一个解决方案。不会接受我自己的回答。

如果任何熟悉正则表达式的人可以帮助解释这一切背后的苦难以及为什么这个正则表达式公式:/[\s]+/mu 或更确切地说 $clean_htmlarray = preg_replace('/[\s]+/mu', ' ', $htmlarray); 解决了这个问题,我很乐意接受它作为一个正确的答案并且解释。

Trim不到位。你想要这个:

$arrayvalue = trim($arrayvalue);

确实如此。 Trim returns 修剪后的字符串:它不会就地修改变量。

我找到了解决方案。

不太确定它是如何工作的。我对正则表达式很不熟悉。

但我找到的解决方案(也许有人可以解释一下?)是

$clean_htmlarray = preg_replace('/[\s]+/mu', ' ', $htmlarray);

有效的整个脚本(不包括 MySQL 内容)

$converted = html_entity_decode( $row['post_content'], ENT_QUOTES);
$converted = trim($converted, chr(0xC2).chr(0xA0));

$htmlarray = preg_split('/<.+?>/', $converted);

$clean_htmlarray = preg_replace('/[\s]+/mu', ' ', $htmlarray);

$htmlarray2 = array_filter(array_map('trim', $clean_htmlarray));

$clean_htmlarray2 = array_values($htmlarray2);

echo '<pre>';
print_r($clean_htmlarray2);
echo '</pre>';

输出为

Array
(
    [0] => Saepe Encomia 2.aD NEC Mirum Populo Soluni Iis 8679-1370 Status Error Sed 9.9
    [1] => Description
    [2] => Donec Rem
    [3] => Animam Urgebat
    [4] => Rerum Sed 8613 - 3669 8358 & 6699
    [5] => 1.mE (magNA) QUO Ad Nominum Statum Massa
    [6] => ab SEM Autem Reddet Habitu Sit
    [7] => PRAEDAM ACCUMSAN PERSONARUM DENEGARE AC DUORUM
    [8] => Lius typi sit nec quo adversis cras ministri oppressa, versus class hic rem quos colubros ullo commune!economy!
    [9] => ad Quisque Modeste
    [10] => ac Rem Wisi
    [11] => ex Hac Congue mus Leo
    [12] => ab 7/92" Alias
    [13] => ad 2/73" Adverso & Erat
    [14] => me Personom Eget
    [15] => ad Viribus Fuga Fuga
    [16] => ab Louor-Sit Molles
    [17] => 3x Block-Off Plates
    [18] => ad Facunda
    [19] => ab Personas Diam
    [20] => NUNC
    [21] => ex Teniet te Palmam Eaque
    [22] => me Teniet in Versus Urna
    [23] => **CONDEMNENDUS REM CUM MAGNORUM**
)

完全修剪的数组。

这在我的 while 循环中也适用于所有行,即:

$results = $mysqli->query("SELECT ID, post_content
FROM wp_posts'
LIMIT 50;");

在这种情况下,我得到了所有 50 行字符串完全修剪。

所以最后...这是一个需要解决的挑战!

我只是希望我能多了解它。我真的不觉得我应该被确认为这个问题的答案,因为我所做的只是尝试了一堆不同的东西,最后这成功了。

如果有人想插话并解释为什么 $clean_htmlarray = preg_replace('/[\s]+/mu', ' ', $htmlarray); 或更确切地说 /[\s]+/mu 是我在这种情况下所需要的,我很乐意将答案授予他们:)

至于现在很高兴它工作正常。感谢大家为此提供的所有帮助和意见!

这是您要求的对解决您问题的正则表达式模式的解释:

/[\s]+/(更简单地表示为 /\s+/)表示“寻找一个或多个白色-space字符(这包括: ' '、'\r'、'\n'、'\t'、'\f'、'\v')。 multi-line modifier/flag 不是必需的,因为您没有在模式中使用锚点 (^ $)。 unicode modifier/flag 在你的情况下 绝对关键 因为你的 html 文本字符串包含许多叫做...

的小恶魔

"NO-BREAK SPACE" and is a combination of unicode characters 194 and 160 represented as \x{00A0} See them highlighted here.

没有 u 标志,NO-BREAK SPACE 个字符将保留,需要额外的过滤才能删除它们。


虽然您最终将代码输出到正确的位置。我很高兴制作一个更精简的单步模式,纯粹使用 preg_split().

可以让你更快地到达那里
while ($row = $results->fetch_array()) {
    $texts = preg_split('/\s*<[^>]+>\s*/u', $row['post_content'], 0, PREG_SPLIT_NO_EMPTY);
    var_export($texts);
}

这是一个有效的 regex101 demo

这个新的拆分模式仍然会寻找你的标签,但效率更高,因为在 <> 之间,我只要求匹配所有“不是 [=23=” 的字符]" 通过使用 [^>]+。对于引擎来说,这比要求从 . 代表的一长串字符中进行匹配要简单得多。

此外,我还包括了对您的 unicode 扩展白色 space 字符的匹配。 \s* 将匹配每个标签前后的零个或多个白色-space 字符。

最后,我要解释一下preg_split()上的附加参数。 0 表示“查找无限匹配项”——这是默认行为,但我必须使用 0-1 作为其值来保留其位置以确保使用最终参数. PREG_SPLIT_NO_EMPTY 使您不必在以后采取额外的步骤使用 array_filter()。它忽略了拆分生成的任何空元素,因此您只会得到好的东西。