PHP 和 DOM Xpath - 删除 childNode 并排列字符串
PHP with DOM Xpath - Remove childNode and arrange string
我有这个 html 结构:
<html>
<body>
<section>
<div>
<div>
<section>
<div>
<table>
<tbody>
<tr></tr>
<tr>
<td></td>
<td></td>
<td>
<i></i>
<div class="first-div class-one">
<div class="second-div"> soft </div>
130 cm / 15cm
</div>
</td>
</tr>
<tr></tr>
</tbody>
</table>
</div>
</section>
</div>
</div>
</section>
</body>
</html>
现在,我有了这个 XPath 代码:
$doc = new DOMDocument();
@$doc->loadHtmlFile('http://www.whatever.com');
$doc->preserveWhiteSpace = false;
$xpath = new DOMXPath( $doc );
$nodelist = $xpath->query( '/html/body/section/div[2]/section/div/table/tbody/tr[2]/td[3]/div' );
foreach ( $nodelist as $node ) {
$result = $node->nodeValue."\n";
}
这让我 'soft 130 cm / 15cm' 结果。
但是我想知道如何只得到'15',所以我需要:
1.要知道如何摆脱 childNode->nodeValue
2。一旦我有了“130 cm / 15cm”,就知道如何只获得“15”作为 PHP.
中变量的节点值
你们能帮忙吗?
提前致谢
标签中的文本也是一个节点(子节点),更具体地说是 DOMText
。
通过查看 div
的子项,您可以找到 DOMText
并获得它的 nodeValue
。下面是一个例子:
$doc = new DOMDocument();
$doc->loadHTML("<html><body><p>bah</p>Test</body></html>");
echo $doc->saveHTML();
$xpath = new DOMXPath( $doc );
$nodelist = $xpath->query( '/html/body' );
foreach ( $nodelist as $node ) {
if ($node->childNodes)
foreach ($node->childNodes as $child) {
if($child instanceof DOMText)
echo $child->nodeValue."\n"; // should output "Test".
}
}
你的第二点可以很容易地用正则表达式来完成:
$string = "130 cm / 15cm";
$matches = array();
preg_match('|/ ([0-9]+) ?cm$|', $string, $matches);
echo $matches[1];
完整解决方案:
<?php
$strhtml = '
<html>
<body>
<section>
<div>
<div>
<section>
<div>
<table>
<tbody>
<tr></tr>
<tr>
<td></td>
<td></td>
<td>
<i></i>
<div class="first-div class-one">
<div class="second-div"> soft </div>
130 cm / 15cm
</div>
</td>
</tr>
<tr></tr>
</tbody>
</table>
</div>
</section>
</div>
</div>
</section>
</body>
</html>';
$doc = new DOMDocument();
@$doc->loadHTML($strhtml);
echo $doc->saveHTML();
$xpath = new DOMXPath( $doc );
$nodelist = $xpath->query( '/html/body/section/div/div/section/div/table/tbody/tr[2]/td[3]/div' );
foreach ( $nodelist as $node ) {
if ($node->childNodes)
foreach ($node->childNodes as $child) {
if($child instanceof DOMText && trim($child->nodeValue) != "")
{
echo 'Raw: '.trim($child->nodeValue)."\n";
$matches = array();
preg_match('|/ ([0-9]+) ?cm$|', trim($child->nodeValue), $matches);
echo 'Value: '.$matches[1]."\n";
}
}
}
我有这个 html 结构:
<html>
<body>
<section>
<div>
<div>
<section>
<div>
<table>
<tbody>
<tr></tr>
<tr>
<td></td>
<td></td>
<td>
<i></i>
<div class="first-div class-one">
<div class="second-div"> soft </div>
130 cm / 15cm
</div>
</td>
</tr>
<tr></tr>
</tbody>
</table>
</div>
</section>
</div>
</div>
</section>
</body>
</html>
现在,我有了这个 XPath 代码:
$doc = new DOMDocument();
@$doc->loadHtmlFile('http://www.whatever.com');
$doc->preserveWhiteSpace = false;
$xpath = new DOMXPath( $doc );
$nodelist = $xpath->query( '/html/body/section/div[2]/section/div/table/tbody/tr[2]/td[3]/div' );
foreach ( $nodelist as $node ) {
$result = $node->nodeValue."\n";
}
这让我 'soft 130 cm / 15cm' 结果。
但是我想知道如何只得到'15',所以我需要:
1.要知道如何摆脱 childNode->nodeValue
2。一旦我有了“130 cm / 15cm”,就知道如何只获得“15”作为 PHP.
中变量的节点值你们能帮忙吗? 提前致谢
标签中的文本也是一个节点(子节点),更具体地说是 DOMText
。
通过查看 div
的子项,您可以找到 DOMText
并获得它的 nodeValue
。下面是一个例子:
$doc = new DOMDocument();
$doc->loadHTML("<html><body><p>bah</p>Test</body></html>");
echo $doc->saveHTML();
$xpath = new DOMXPath( $doc );
$nodelist = $xpath->query( '/html/body' );
foreach ( $nodelist as $node ) {
if ($node->childNodes)
foreach ($node->childNodes as $child) {
if($child instanceof DOMText)
echo $child->nodeValue."\n"; // should output "Test".
}
}
你的第二点可以很容易地用正则表达式来完成:
$string = "130 cm / 15cm";
$matches = array();
preg_match('|/ ([0-9]+) ?cm$|', $string, $matches);
echo $matches[1];
完整解决方案:
<?php
$strhtml = '
<html>
<body>
<section>
<div>
<div>
<section>
<div>
<table>
<tbody>
<tr></tr>
<tr>
<td></td>
<td></td>
<td>
<i></i>
<div class="first-div class-one">
<div class="second-div"> soft </div>
130 cm / 15cm
</div>
</td>
</tr>
<tr></tr>
</tbody>
</table>
</div>
</section>
</div>
</div>
</section>
</body>
</html>';
$doc = new DOMDocument();
@$doc->loadHTML($strhtml);
echo $doc->saveHTML();
$xpath = new DOMXPath( $doc );
$nodelist = $xpath->query( '/html/body/section/div/div/section/div/table/tbody/tr[2]/td[3]/div' );
foreach ( $nodelist as $node ) {
if ($node->childNodes)
foreach ($node->childNodes as $child) {
if($child instanceof DOMText && trim($child->nodeValue) != "")
{
echo 'Raw: '.trim($child->nodeValue)."\n";
$matches = array();
preg_match('|/ ([0-9]+) ?cm$|', trim($child->nodeValue), $matches);
echo 'Value: '.$matches[1]."\n";
}
}
}