从文本(一篇文章)中删除不需要的 html 代码
Remove unwanted html code from text (an article)
我有一个刚从 HTML 迁移过来的 joomla 网站。有 1000 篇文章,每篇文章都包含不需要的 HTML 代码,如下所示。
我怎样才能删除这些文章中的 HTML 而不必打开每篇文章进行编辑?
<div id="mainDIV">
<div id="topDIV">
<div id="topnav">
<div>
<div id="topnavdiv0"> </div>
<div id="topnavdiv"><a href="../store/">SHOP NOW</a> <img title="" src="images/shop-basket.gif" /> | 1-800-336-1630</div>
</div>
</div>
</div>
<div style="clear: both;"> </div>
<table id="mainBody" >
<tbody>
<tr>
<td id="left"> </td>
<td id="mid"><!-- top -->
<div id="top1">
<div id="bbb-logo"><a href="http://app.southeasttexas.bbb.org/report/10014674/"><img src="images/logo-bbb.gif" alt="metal-market-report-02-27-12" /></a></div>
</div>
<!--div id="top2"></div-->
<div id="flashnav"> </div>
<div id="topsep"> </div>
<!-- top --> <!-- content -->
<table id="contentBody">
<tbody>
<tr>
<td id="contentSep"> </td>
<td id="contentLeft">
<div id="titleBGlong">Metals Market Reports</div>
<br />
我真希望我不必再回来问同样的问题,但即使删除了所有问题,我仍然会出错;
请看下面的错误:
There seems to be an error in your SQL query. The MySQL server error output below, if there is any, may also help you in diagnosing the problem
ERROR: Unknown Punctuation String @ 1
STR: <?
SQL: <?php
$query = mysqli_query($con, 'SELECT * FROM th18k_content WHERE id BETWEEN 0 AND 50');
SQL query: Documentation
MySQL said: Documentation
#1064 - You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '<?php
$query = mysqli_query($con, 'SELECT * FROM th18k_content WHERE id BETWEEN' at line 1
您想去掉文章中的 HTML 标签吗?首先在 table 中找到存储在您的数据库中的那些文章,然后获取它们并使用
浏览它们
<?php
$query = mysqli_query($con, 'SELECT * FROM th18k_content WHERE id BETWEEN 0 AND 50');
//get articles from database
while ($row= mysqli_fetch_array($query, MYSQLI_ASSOC)) { //for each article
$lines = explode('\n',$row['article']); //split it into lines
for($i=0;$i<sizeof($lines);$i++) //so we can remove
{ //the ones that we don't need
if(strpos($line,'titleBGlong') === false) //if 'titleBGlong' isn't found...
{
unset($lines[$i]); //remove the line
}
else
{
$newarticle = implode('\n',$lines); //else put it back together
break; //and exit the loop
} //now the $newarticle has the beginning removed
}
$strippedarticle = strip_tags($newarticle );//remove HTML tags
mysqli_query($con, 'UPDATE th18k_content SET article = "'.$strippedarticle.'" WHERE id = '.$row['id']);
} //replace the article in the db
?>
我不知道你的数据库列和 table 到底叫什么,所以你需要更改它。此外,我在 0 到 50 之间这样做,因为你可能会用查询淹没数据库,因为每篇文章需要 2 个查询(只是 运行 代码,更改为下一个 50 并再次 运行,等等)
@编辑
该脚本可以 运行 通过将其保存在服务器上的 .php 文件中并像普通网站页面一样 运行 将其 运行 设置(在此示例中我没有连接到数据库)
这将删除所有行,直到找到 "titleBGlong",然后您可以使用 strip_tags 删除标签
我有一个刚从 HTML 迁移过来的 joomla 网站。有 1000 篇文章,每篇文章都包含不需要的 HTML 代码,如下所示。 我怎样才能删除这些文章中的 HTML 而不必打开每篇文章进行编辑?
<div id="mainDIV">
<div id="topDIV">
<div id="topnav">
<div>
<div id="topnavdiv0"> </div>
<div id="topnavdiv"><a href="../store/">SHOP NOW</a> <img title="" src="images/shop-basket.gif" /> | 1-800-336-1630</div>
</div>
</div>
</div>
<div style="clear: both;"> </div>
<table id="mainBody" >
<tbody>
<tr>
<td id="left"> </td>
<td id="mid"><!-- top -->
<div id="top1">
<div id="bbb-logo"><a href="http://app.southeasttexas.bbb.org/report/10014674/"><img src="images/logo-bbb.gif" alt="metal-market-report-02-27-12" /></a></div>
</div>
<!--div id="top2"></div-->
<div id="flashnav"> </div>
<div id="topsep"> </div>
<!-- top --> <!-- content -->
<table id="contentBody">
<tbody>
<tr>
<td id="contentSep"> </td>
<td id="contentLeft">
<div id="titleBGlong">Metals Market Reports</div>
<br />
我真希望我不必再回来问同样的问题,但即使删除了所有问题,我仍然会出错; 请看下面的错误:
There seems to be an error in your SQL query. The MySQL server error output below, if there is any, may also help you in diagnosing the problem
ERROR: Unknown Punctuation String @ 1
STR: <?
SQL: <?php
$query = mysqli_query($con, 'SELECT * FROM th18k_content WHERE id BETWEEN 0 AND 50');
SQL query: Documentation
MySQL said: Documentation
#1064 - You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '<?php
$query = mysqli_query($con, 'SELECT * FROM th18k_content WHERE id BETWEEN' at line 1
您想去掉文章中的 HTML 标签吗?首先在 table 中找到存储在您的数据库中的那些文章,然后获取它们并使用
浏览它们<?php
$query = mysqli_query($con, 'SELECT * FROM th18k_content WHERE id BETWEEN 0 AND 50');
//get articles from database
while ($row= mysqli_fetch_array($query, MYSQLI_ASSOC)) { //for each article
$lines = explode('\n',$row['article']); //split it into lines
for($i=0;$i<sizeof($lines);$i++) //so we can remove
{ //the ones that we don't need
if(strpos($line,'titleBGlong') === false) //if 'titleBGlong' isn't found...
{
unset($lines[$i]); //remove the line
}
else
{
$newarticle = implode('\n',$lines); //else put it back together
break; //and exit the loop
} //now the $newarticle has the beginning removed
}
$strippedarticle = strip_tags($newarticle );//remove HTML tags
mysqli_query($con, 'UPDATE th18k_content SET article = "'.$strippedarticle.'" WHERE id = '.$row['id']);
} //replace the article in the db
?>
我不知道你的数据库列和 table 到底叫什么,所以你需要更改它。此外,我在 0 到 50 之间这样做,因为你可能会用查询淹没数据库,因为每篇文章需要 2 个查询(只是 运行 代码,更改为下一个 50 并再次 运行,等等)
@编辑 该脚本可以 运行 通过将其保存在服务器上的 .php 文件中并像普通网站页面一样 运行 将其 运行 设置(在此示例中我没有连接到数据库)
这将删除所有行,直到找到 "titleBGlong",然后您可以使用 strip_tags 删除标签