PHP DOMXPath 提取 td 内锚点的 href
PHP DOMXPath extract href of anchor inside a td
使用 PHP DOMXPath 我需要获取包含在 td 中的锚点的 "href"。
我已经能够获得所有正确的 xPath 以到达 td 并且我可以在其中获取文本但我无法理解如何提取锚点。
对于我的其他需要,我必须提取所有 tr 作为第一步,所以我当前的代码如下:
$xpath = new DOMXPath($dom);
$trList = $xpath->query('//div[@id="main_content"]/table/tr/td/table[3]/tr[2]/td/table/tr');
$rowToSkip = 1;
foreach($trList as $rowNum => $row){
if($rowNum <= $rowToSkip){
continue;
}
$cols = $row->childNodes;
$dataList[($rowNum-$rowToSkip)]['number'] = preg_replace("/[^0-9]/", "", strip_tags($cols->item(2)->nodeValue));
}
如何检索 href?
我也试试
$cols->item(2)->attributes->getNamedItem("href")->nodeValue
但运气不好
下面 HTML 与原始样本完全相同的样本:
<div id="main_content">
<table class="wrapper" border="0" cellspacing="0" cellpadding="0">
<tr>
<td>
<table border="0" cellspacing="0" cellpadding="0" id="breadcrumb">
<tr>
<td class="breadcrumb">
<a href="" class="breadcrumb">head link</a>
<a href="" class="breadcrumb">head link</a>
</td>
</tr>
</table>
<div><img src="space.gif" width="1" height="7" alt="" border="0"></div>
<table border="0" cellspacing="0" cellpadding="0" width="100%">
<tr>
<td colspan="5" >test</td>
</tr>
<tr>
<td colspan="5"></td>
</tr>
</table>
<div><img width="1" height="32" border="0" alt="" src="space.gif"></div>
<table border="0" cellpadding="0" cellspacing="0" width="100%">
<tr>
<td width="100%" >test 02</td>
</tr>
<tr>
<td>
<table width="100%" border="0" cellspacing="0" cellpadding="0">
<tr>
<td nowrap="nowrap" colspan="8">header col 1</td>
<td nowrap="nowrap" colspan="5">header col 2</td>
</tr>
<tr>
<td nowrap="nowrap">
<a href="" >test col 0</a>
</td>
<td nowrap="nowrap">
<a href="" >test col 1</a>
</td>
<td nowrap="nowrap">test col 2</td>
<td nowrap="nowrap">
<a href="" >test col 3</a>
</td>
<td nowrap="nowrap">
<a href="" >test col 4</a>
</td>
<td nowrap="nowrap">
<a href="" >test col 5</a>
</td>
<td nowrap="nowrap">test col 6</td>
<td nowrap="nowrap">test col 7</td>
<td nowrap="nowrap">test col 8</td>
<td nowrap="nowrap">test col 9</td>
<td nowrap="nowrap">test col 10</td>
<td nowrap="nowrap">test col 11</td>
<td nowrap="nowrap">test col 12</td>
</tr>
<tr>
<td nowrap="nowrap" rowspan="1">
<a href="" >detail info col 0</a>
</td>
<td nowrap="nowrap" rowspan="1" style="background-color:red">
<a href="" >detail info col 1 this is needed column</a>
</td>
<td nowrap="nowrap" rowspan="1">
<a href="" >detail info col 2</a>
</td>
<td nowrap="nowrap" rowspan="1">
<a href="" >detail info col 3</a>
</td>
<td nowrap="nowrap" rowspan="1">
<a href="" >detail info col 4</a>
</td>
<td nowrap="nowrap" rowspan="1">
<a href="" >detail info col 5</a>
</td>
<td nowrap="nowrap" rowspan="1">
<a href="" >detail info col 6</a>
</td>
<td nowrap="nowrap" rowspan="1">
<a href="" >detail info col 7</a>
</td>
<td nowrap="nowrap" rowspan="1">
<a href="" >detail info col 8</a>
</td>
<td nowrap="nowrap" rowspan="1">
<a href="" >detail info col 9</a>
</td>
<td nowrap="nowrap" rowspan="1">
<a href="" >detail info col 10</a>
</td>
<td nowrap="nowrap" rowspan="1">
<a href="" >detail info col 11</a>
</td>
<td nowrap="nowrap" rowspan="1">
<a href="" >detail info col 12</a>
</td>
</tr>
</table>
</td>
</tr>
</table>
</td>
</tr>
</table>
根据您发布的结构,以下输出 href 值:
<?php
$dom = new DOMDocument('1.0');
$dom->loadHTMLFile('input.html');
$xpath = new DOMXPath($dom);
$query = '//*[@id="main_content"]/table/tr/td/table[3]/tr[2]/td/table/tr[position() >= 3]/td[2]/a';
$nodes = $xpath->query($query);
foreach ($nodes as $node) {
/** @var $node DOMElement */
var_dump(
$node->getAttribute('href'), // the href-attribute value
$node->nodeValue // the inner text
);
}
使用 PHP DOMXPath 我需要获取包含在 td 中的锚点的 "href"。 我已经能够获得所有正确的 xPath 以到达 td 并且我可以在其中获取文本但我无法理解如何提取锚点。 对于我的其他需要,我必须提取所有 tr 作为第一步,所以我当前的代码如下:
$xpath = new DOMXPath($dom);
$trList = $xpath->query('//div[@id="main_content"]/table/tr/td/table[3]/tr[2]/td/table/tr');
$rowToSkip = 1;
foreach($trList as $rowNum => $row){
if($rowNum <= $rowToSkip){
continue;
}
$cols = $row->childNodes;
$dataList[($rowNum-$rowToSkip)]['number'] = preg_replace("/[^0-9]/", "", strip_tags($cols->item(2)->nodeValue));
}
如何检索 href?
我也试试
$cols->item(2)->attributes->getNamedItem("href")->nodeValue
但运气不好
下面 HTML 与原始样本完全相同的样本:
<div id="main_content">
<table class="wrapper" border="0" cellspacing="0" cellpadding="0">
<tr>
<td>
<table border="0" cellspacing="0" cellpadding="0" id="breadcrumb">
<tr>
<td class="breadcrumb">
<a href="" class="breadcrumb">head link</a>
<a href="" class="breadcrumb">head link</a>
</td>
</tr>
</table>
<div><img src="space.gif" width="1" height="7" alt="" border="0"></div>
<table border="0" cellspacing="0" cellpadding="0" width="100%">
<tr>
<td colspan="5" >test</td>
</tr>
<tr>
<td colspan="5"></td>
</tr>
</table>
<div><img width="1" height="32" border="0" alt="" src="space.gif"></div>
<table border="0" cellpadding="0" cellspacing="0" width="100%">
<tr>
<td width="100%" >test 02</td>
</tr>
<tr>
<td>
<table width="100%" border="0" cellspacing="0" cellpadding="0">
<tr>
<td nowrap="nowrap" colspan="8">header col 1</td>
<td nowrap="nowrap" colspan="5">header col 2</td>
</tr>
<tr>
<td nowrap="nowrap">
<a href="" >test col 0</a>
</td>
<td nowrap="nowrap">
<a href="" >test col 1</a>
</td>
<td nowrap="nowrap">test col 2</td>
<td nowrap="nowrap">
<a href="" >test col 3</a>
</td>
<td nowrap="nowrap">
<a href="" >test col 4</a>
</td>
<td nowrap="nowrap">
<a href="" >test col 5</a>
</td>
<td nowrap="nowrap">test col 6</td>
<td nowrap="nowrap">test col 7</td>
<td nowrap="nowrap">test col 8</td>
<td nowrap="nowrap">test col 9</td>
<td nowrap="nowrap">test col 10</td>
<td nowrap="nowrap">test col 11</td>
<td nowrap="nowrap">test col 12</td>
</tr>
<tr>
<td nowrap="nowrap" rowspan="1">
<a href="" >detail info col 0</a>
</td>
<td nowrap="nowrap" rowspan="1" style="background-color:red">
<a href="" >detail info col 1 this is needed column</a>
</td>
<td nowrap="nowrap" rowspan="1">
<a href="" >detail info col 2</a>
</td>
<td nowrap="nowrap" rowspan="1">
<a href="" >detail info col 3</a>
</td>
<td nowrap="nowrap" rowspan="1">
<a href="" >detail info col 4</a>
</td>
<td nowrap="nowrap" rowspan="1">
<a href="" >detail info col 5</a>
</td>
<td nowrap="nowrap" rowspan="1">
<a href="" >detail info col 6</a>
</td>
<td nowrap="nowrap" rowspan="1">
<a href="" >detail info col 7</a>
</td>
<td nowrap="nowrap" rowspan="1">
<a href="" >detail info col 8</a>
</td>
<td nowrap="nowrap" rowspan="1">
<a href="" >detail info col 9</a>
</td>
<td nowrap="nowrap" rowspan="1">
<a href="" >detail info col 10</a>
</td>
<td nowrap="nowrap" rowspan="1">
<a href="" >detail info col 11</a>
</td>
<td nowrap="nowrap" rowspan="1">
<a href="" >detail info col 12</a>
</td>
</tr>
</table>
</td>
</tr>
</table>
</td>
</tr>
</table>
根据您发布的结构,以下输出 href 值:
<?php
$dom = new DOMDocument('1.0');
$dom->loadHTMLFile('input.html');
$xpath = new DOMXPath($dom);
$query = '//*[@id="main_content"]/table/tr/td/table[3]/tr[2]/td/table/tr[position() >= 3]/td[2]/a';
$nodes = $xpath->query($query);
foreach ($nodes as $node) {
/** @var $node DOMElement */
var_dump(
$node->getAttribute('href'), // the href-attribute value
$node->nodeValue // the inner text
);
}