从 HTML 个文本中提取特定文本
Extracting specific text from HTML texts
我不太熟悉正则表达式。我正在尝试获得底部描述的结果。这是我到目前为止所做的(注意 $page
包含制表符):
$page = "<div class=\"title-container\">
<h1>Text here<span> /Sub-text/</span> </h1>
</div>";
// TITLE
preg_match_all ('/<h1>(.*)<\/h1>/U', $page, $out);
$hutitle = preg_replace("#<span>(.*)<\/span>\s#", "", $out[1][0]);
$entitle = preg_replace("'(.*)<span> /'", "", $out[1][0]);
我想得到这个:
$hutitle = "Text here";
$entitle = "Sub-text"; (Without html and "/")
试试这个
<h1>(.*?)<span> /(.*?)/</span>
$1 和 $2 是您预期的结果。
我建议将 DOM 与 trim
一起使用,不需要正则表达式,这里是您的具体案例的工作代码:
$page = "<div class=\"title-container\">\n <h1>Text here<span> /Sub-text/</span> </h1>\n </div>";
$dom = new DOMDocument;
$dom->loadHTML($page);
$hs = $dom->getElementsByTagName('h1');
foreach ($hs as $h) {
$enttitlenodes = $h->getElementsByTagName('span');
if ($enttitlenodes->length > 0 && $enttitlenodes->item(0)->tagName == 'span')
{
$entitle = trim($enttitlenodes->item(0)->nodeValue, " /");
echo $entitle . "\n";
$h->removeChild($enttitlenodes->item(0));
}
$hutitle = $h->nodeValue;
echo $hutitle;
}
我不太熟悉正则表达式。我正在尝试获得底部描述的结果。这是我到目前为止所做的(注意 $page
包含制表符):
$page = "<div class=\"title-container\">
<h1>Text here<span> /Sub-text/</span> </h1>
</div>";
// TITLE
preg_match_all ('/<h1>(.*)<\/h1>/U', $page, $out);
$hutitle = preg_replace("#<span>(.*)<\/span>\s#", "", $out[1][0]);
$entitle = preg_replace("'(.*)<span> /'", "", $out[1][0]);
我想得到这个:
$hutitle = "Text here";
$entitle = "Sub-text"; (Without html and "/")
试试这个
<h1>(.*?)<span> /(.*?)/</span>
$1 和 $2 是您预期的结果。
我建议将 DOM 与 trim
一起使用,不需要正则表达式,这里是您的具体案例的工作代码:
$page = "<div class=\"title-container\">\n <h1>Text here<span> /Sub-text/</span> </h1>\n </div>";
$dom = new DOMDocument;
$dom->loadHTML($page);
$hs = $dom->getElementsByTagName('h1');
foreach ($hs as $h) {
$enttitlenodes = $h->getElementsByTagName('span');
if ($enttitlenodes->length > 0 && $enttitlenodes->item(0)->tagName == 'span')
{
$entitle = trim($enttitlenodes->item(0)->nodeValue, " /");
echo $entitle . "\n";
$h->removeChild($enttitlenodes->item(0));
}
$hutitle = $h->nodeValue;
echo $hutitle;
}