使用 class 查找 div 并且它是使用 PHP 简单 HTML DOM 解析器的纯文本
Find div with class and it's plain-text using PHP Simple HTML DOM Parser
我想在工作经验和教育与培训[=28之间找到classft00 =] 并从给定的 html
中提取包含日期的 class 文本
<p class = "ft00">Introduction</p>
<p class = "ft00">John Smith</p>
<p class = "ft02">Email:</p>
<p class = "ft00">John@gmail.com</p>
<p class = "ft00">Work Experience</p>
<p class = "ft00">27 July 2017</p>
<p class = "ft02">ABC Company</p>
<p class = "ft00">19 May 2018</p>
<p class ="ft02">XYZ Company</p>
<p class = "ft00">EDUCATION AND TRAINING</p>
到目前为止我能得到的是提取工作经验和教育和培训之间的所有数据并且它工作正常并且代码是给出如下:-
$fexp = $html->find('p[plaintext^=Work Experience]');
$items = array();
foreach ($fexp as $keye) {
while ( $keye->nextSibling() ) {
if ( $keye->nextSibling() == TRUE ) {
$keye = $keye->nextSibling();
$varce = $keye->plaintext;
}
if ( trim($varce) == "EDUCATION AND TRAINING" ){
break;
}
//$test[] = $collection;
$items[] = $varce;
// echo $varce;
}
}
var_dump($items);
我很接近但似乎无法找到解决方案,任何帮助将不胜感激:-)
使用 DOMDocument and DOMXPath 你可以像下面那样做,我从来没有使用过简单 HTML DOM 解析器,但我假设它有 XPath。
<?php
$dom = new DOMDocument();
$dom->loadHtml('
<p class = "ft00">Introduction</p>
<p class = "ft00">John Smith</p>
<p class = "ft02">Email:</p>
<p class = "ft00">John@gmail.com</p>
<p class = "ft00">Work Experience</p>
<p class = "ft00">27 July 2017</p>
<p class = "ft02">ABC Company</p>
<p class = "ft00">19 May 2018</p>
<p class ="ft02">XYZ Company</p>
<p class = "ft00">EDUCATION AND TRAINING</p>
', LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new DOMXPath($dom);
$result = [];
$matching = false;
foreach ($xpath->query("//p[contains(@class, 'ft00') or contains(@class, 'ft02')]/text()") as $p) {
if ($p->nodeValue === 'Work Experience' || $matching) {
$result[] = $p->nodeValue;
$matching = true;
}
if ($p->nodeValue === 'EDUCATION AND TRAINING') {
break;
}
}
print_r($result);
结果:
Array
(
[0] => Work Experience
[1] => 27 July 2017
[2] => ABC Company
[3] => 19 May 2018
[4] => XYZ Company
[5] => EDUCATION AND TRAINING
)
这是正确的工作代码:-
$test = array();
$matching = false;
$collection = $html->find('p.ft00');
foreach ($collection as $tkey) {
if ($tkey->plaintext == "WORK EXPERIENCE" || $matching ) {
$test[] = $tkey->plaintext;
$matching = true;
}
if ( $tkey->plaintext == "EDUCATION AND TRAINING") {
break;
}
}
var_dump($test);
输出:-
Array
(
[0] => Work Experience
[1] => 27 July 2017
[2] => 19 May 2018
[3] => EDUCATION AND TRAINING
)
我想在工作经验和教育与培训[=28之间找到classft00 =] 并从给定的 html
中提取包含日期的 class 文本<p class = "ft00">Introduction</p>
<p class = "ft00">John Smith</p>
<p class = "ft02">Email:</p>
<p class = "ft00">John@gmail.com</p>
<p class = "ft00">Work Experience</p>
<p class = "ft00">27 July 2017</p>
<p class = "ft02">ABC Company</p>
<p class = "ft00">19 May 2018</p>
<p class ="ft02">XYZ Company</p>
<p class = "ft00">EDUCATION AND TRAINING</p>
到目前为止我能得到的是提取工作经验和教育和培训之间的所有数据并且它工作正常并且代码是给出如下:-
$fexp = $html->find('p[plaintext^=Work Experience]');
$items = array();
foreach ($fexp as $keye) {
while ( $keye->nextSibling() ) {
if ( $keye->nextSibling() == TRUE ) {
$keye = $keye->nextSibling();
$varce = $keye->plaintext;
}
if ( trim($varce) == "EDUCATION AND TRAINING" ){
break;
}
//$test[] = $collection;
$items[] = $varce;
// echo $varce;
}
}
var_dump($items);
我很接近但似乎无法找到解决方案,任何帮助将不胜感激:-)
使用 DOMDocument and DOMXPath 你可以像下面那样做,我从来没有使用过简单 HTML DOM 解析器,但我假设它有 XPath。
<?php
$dom = new DOMDocument();
$dom->loadHtml('
<p class = "ft00">Introduction</p>
<p class = "ft00">John Smith</p>
<p class = "ft02">Email:</p>
<p class = "ft00">John@gmail.com</p>
<p class = "ft00">Work Experience</p>
<p class = "ft00">27 July 2017</p>
<p class = "ft02">ABC Company</p>
<p class = "ft00">19 May 2018</p>
<p class ="ft02">XYZ Company</p>
<p class = "ft00">EDUCATION AND TRAINING</p>
', LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new DOMXPath($dom);
$result = [];
$matching = false;
foreach ($xpath->query("//p[contains(@class, 'ft00') or contains(@class, 'ft02')]/text()") as $p) {
if ($p->nodeValue === 'Work Experience' || $matching) {
$result[] = $p->nodeValue;
$matching = true;
}
if ($p->nodeValue === 'EDUCATION AND TRAINING') {
break;
}
}
print_r($result);
结果:
Array
(
[0] => Work Experience
[1] => 27 July 2017
[2] => ABC Company
[3] => 19 May 2018
[4] => XYZ Company
[5] => EDUCATION AND TRAINING
)
这是正确的工作代码:-
$test = array();
$matching = false;
$collection = $html->find('p.ft00');
foreach ($collection as $tkey) {
if ($tkey->plaintext == "WORK EXPERIENCE" || $matching ) {
$test[] = $tkey->plaintext;
$matching = true;
}
if ( $tkey->plaintext == "EDUCATION AND TRAINING") {
break;
}
}
var_dump($test);
输出:-
Array
(
[0] => Work Experience
[1] => 27 July 2017
[2] => 19 May 2018
[3] => EDUCATION AND TRAINING
)