如何使用 PHP 简单 HTML DOM 解析器仅获取第一个特定标签
How to get only first certain tags with PHP Simple HTML DOM Parser
我正在尝试使用 PHP 简单 HTML DOM 解析器获取前 3 个标签文本,并将它们收集到数组中。
table 就像:
<table>
<tbody>
<tr>
<td>Floyd</td>
<td>Machine</td>
<td>Banking</td>
<td>HelpScout</td>
</tr>
<tr>
<td>Nirvana</td>
<td>Paper</td>
<td>Business</td>
<td>GuitarTuna</td>
</tr>
<tr>
<td>The edge</td>
<td>Tree</td>
<td>Hospital</td>
<td>Sician</td>
</tr>
.....
.....
</tbody>
</table>
我想要实现的是将这些收集到数组中,不包括 tr
标签的第 4 个 td
:
array(
array(
'art' => 'Floyd',
'thing' => 'machine',
'passion' => 'Banking',
),
array(
'art' => 'Nirvana',
'thing' => 'Paper',
'passion' => 'Business',
),
array(
'art' => 'The edge',
'thing' => 'Tree',
'passion' => 'Hospital',
),
);
这是我试过的:
require_once dirname( __FILE__ ) . '/library/simple_html_dom.php';
$html = file_get_html( 'https://www.example.com/list.html' );
$collect = array();
$list = $html->find( 'table tbody tr td' );
foreach( $list as $l ) {
$collect[] = $l->plaintext;
}
$html->clear();
unset($html);
print_r($collect);
它给出了数组中的所有 td
,很难识别我需要的数组键。我有什么解决方案吗?
不是一次遍历所有 td
元素,您可以遍历每个 tr
并且对于每个 tr,遍历内部 td 元素并跳过第 4 个 td:
$htmlString =<<<html
<table>
<tbody>
<tr>
<td>Floyd</td>
<td>Machine</td>
<td>Banking</td>
<td>HelpScout</td>
</tr>
<tr>
<td>Nirvana</td>
<td>Paper</td>
<td>Business</td>
<td>GuitarTuna</td>
</tr>
<tr>
<td>The edge</td>
<td>Tree</td>
<td>Hospital</td>
<td>Sician</td>
</tr>
</tbody>
</table>
html;
$html = str_get_html($htmlString);
// find all tr tags
$trs = $html->find('table tr');
$collect = [];
// foreach tr tag, find its td children
foreach ($trs as $tr) {
$tds = $tr->find('td');
// collect first 3 children and skip the 4th
$collect []= [
'art' => $tds[0]->plaintext,
'thing' => $tds[1]->plaintext,
'passion' => $tds[2]->plaintext,
];
}
print_r($collect);
输出是:
Array
(
[0] => Array
(
[art] => Floyd
[thing] => Machine
[passion] => Banking
)
[1] => Array
(
[art] => Nirvana
[thing] => Paper
[passion] => Business
)
[2] => Array
(
[art] => The edge
[thing] => Tree
[passion] => Hospital
)
)
我正在尝试使用 PHP 简单 HTML DOM 解析器获取前 3 个标签文本,并将它们收集到数组中。
table 就像:
<table>
<tbody>
<tr>
<td>Floyd</td>
<td>Machine</td>
<td>Banking</td>
<td>HelpScout</td>
</tr>
<tr>
<td>Nirvana</td>
<td>Paper</td>
<td>Business</td>
<td>GuitarTuna</td>
</tr>
<tr>
<td>The edge</td>
<td>Tree</td>
<td>Hospital</td>
<td>Sician</td>
</tr>
.....
.....
</tbody>
</table>
我想要实现的是将这些收集到数组中,不包括 tr
标签的第 4 个 td
:
array(
array(
'art' => 'Floyd',
'thing' => 'machine',
'passion' => 'Banking',
),
array(
'art' => 'Nirvana',
'thing' => 'Paper',
'passion' => 'Business',
),
array(
'art' => 'The edge',
'thing' => 'Tree',
'passion' => 'Hospital',
),
);
这是我试过的:
require_once dirname( __FILE__ ) . '/library/simple_html_dom.php';
$html = file_get_html( 'https://www.example.com/list.html' );
$collect = array();
$list = $html->find( 'table tbody tr td' );
foreach( $list as $l ) {
$collect[] = $l->plaintext;
}
$html->clear();
unset($html);
print_r($collect);
它给出了数组中的所有 td
,很难识别我需要的数组键。我有什么解决方案吗?
不是一次遍历所有 td
元素,您可以遍历每个 tr
并且对于每个 tr,遍历内部 td 元素并跳过第 4 个 td:
$htmlString =<<<html
<table>
<tbody>
<tr>
<td>Floyd</td>
<td>Machine</td>
<td>Banking</td>
<td>HelpScout</td>
</tr>
<tr>
<td>Nirvana</td>
<td>Paper</td>
<td>Business</td>
<td>GuitarTuna</td>
</tr>
<tr>
<td>The edge</td>
<td>Tree</td>
<td>Hospital</td>
<td>Sician</td>
</tr>
</tbody>
</table>
html;
$html = str_get_html($htmlString);
// find all tr tags
$trs = $html->find('table tr');
$collect = [];
// foreach tr tag, find its td children
foreach ($trs as $tr) {
$tds = $tr->find('td');
// collect first 3 children and skip the 4th
$collect []= [
'art' => $tds[0]->plaintext,
'thing' => $tds[1]->plaintext,
'passion' => $tds[2]->plaintext,
];
}
print_r($collect);
输出是:
Array
(
[0] => Array
(
[art] => Floyd
[thing] => Machine
[passion] => Banking
)
[1] => Array
(
[art] => Nirvana
[thing] => Paper
[passion] => Business
)
[2] => Array
(
[art] => The edge
[thing] => Tree
[passion] => Hospital
)
)