如何使用 symfony dom 爬虫解析 html table 到数组
How to parse html table to array with symfony dom crawler
我有 html table 并且我想从中创建数组 table
$html = '<table>
<tr>
<td>satu</td>
<td>dua</td>
</tr>
<tr>
<td>tiga</td>
<td>empat</td>
</tr>
</table>
我的数组必须如下所示
array(
array(
"satu",
"dua",
),
array(
"tiga",
"empat",
)
)
我尝试了下面的代码,但无法获得我需要的数组
$crawler = new Crawler();
$crawler->addHTMLContent($html);
$row = array();
$tr_elements = $crawler->filterXPath('//table/tr');
foreach ($tr_elements as $tr) {
// ???????
}
$html = '<table>
<tr>
<td>satu</td>
<td>dua</td>
</tr>
<tr>
<td>tiga</td>
<td>empat</td>
</tr>
</table>';
$crawler = new Crawler();
$crawler->addHTMLContent($html);
$rows = array();
$tr_elements = $crawler->filterXPath('//table/tr');
// iterate over filter results
foreach ($tr_elements as $i => $content) {
$tds = array();
// create crawler instance for result
$crawler = new Crawler($content);
//iterate again
foreach ($crawler->filter('td') as $i => $node) {
// extract the value
$tds[] = $node->nodeValue;
}
$rows[] = $tds;
}
var_dump($rows );exit;
会显示
array
0 =>
array
0 => string 'satu'
1 => string 'dua'
1 =>
array (size=2)
0 => string 'tiga'
1 => string 'empat'
$table = $crawler->filter('table')->filter('tr')->each(function ($tr, $i) {
return $tr->filter('td')->each(function ($td, $i) {
return trim($td->text());
});
});
print_r($table);
以上示例将为您提供一个多维数组,其中第一层是 table 行 "tr",第二层是 table 列 "td"。
编辑
如果您有嵌套的 tables,此代码会将它们很好地展平为一维数组。
$html = 'MY HTML HERE';
$crawler = new Crawler($html);
$flat = function(string $selector) use ($crawler) {
$result = [];
$crawler->filter($selector)->each(function ($table, $i) use (&$result) {
$table->filter('tr')->each(function ($tr, $i) use (&$result) {
$tr->filter('td')->each(function ($td, $i) use (&$result) {
$html = trim($td->html());
if (strpos($html, '<table') !== FALSE) return;
$iterator = $td->getIterator()->getArrayCopy()[0];
$address = $iterator->getNodePath();
if (!empty($html)) $result[$address] = $html;
});
});
});
return $result;
};
// The selector gotta point to the most outwards table.
print_r($flat('#Prod fieldset div table'));
我有 html table 并且我想从中创建数组 table
$html = '<table>
<tr>
<td>satu</td>
<td>dua</td>
</tr>
<tr>
<td>tiga</td>
<td>empat</td>
</tr>
</table>
我的数组必须如下所示
array(
array(
"satu",
"dua",
),
array(
"tiga",
"empat",
)
)
我尝试了下面的代码,但无法获得我需要的数组
$crawler = new Crawler();
$crawler->addHTMLContent($html);
$row = array();
$tr_elements = $crawler->filterXPath('//table/tr');
foreach ($tr_elements as $tr) {
// ???????
}
$html = '<table>
<tr>
<td>satu</td>
<td>dua</td>
</tr>
<tr>
<td>tiga</td>
<td>empat</td>
</tr>
</table>';
$crawler = new Crawler();
$crawler->addHTMLContent($html);
$rows = array();
$tr_elements = $crawler->filterXPath('//table/tr');
// iterate over filter results
foreach ($tr_elements as $i => $content) {
$tds = array();
// create crawler instance for result
$crawler = new Crawler($content);
//iterate again
foreach ($crawler->filter('td') as $i => $node) {
// extract the value
$tds[] = $node->nodeValue;
}
$rows[] = $tds;
}
var_dump($rows );exit;
会显示
array
0 =>
array
0 => string 'satu'
1 => string 'dua'
1 =>
array (size=2)
0 => string 'tiga'
1 => string 'empat'
$table = $crawler->filter('table')->filter('tr')->each(function ($tr, $i) {
return $tr->filter('td')->each(function ($td, $i) {
return trim($td->text());
});
});
print_r($table);
以上示例将为您提供一个多维数组,其中第一层是 table 行 "tr",第二层是 table 列 "td"。
编辑
如果您有嵌套的 tables,此代码会将它们很好地展平为一维数组。
$html = 'MY HTML HERE';
$crawler = new Crawler($html);
$flat = function(string $selector) use ($crawler) {
$result = [];
$crawler->filter($selector)->each(function ($table, $i) use (&$result) {
$table->filter('tr')->each(function ($tr, $i) use (&$result) {
$tr->filter('td')->each(function ($td, $i) use (&$result) {
$html = trim($td->html());
if (strpos($html, '<table') !== FALSE) return;
$iterator = $td->getIterator()->getArrayCopy()[0];
$address = $iterator->getNodePath();
if (!empty($html)) $result[$address] = $html;
});
});
});
return $result;
};
// The selector gotta point to the most outwards table.
print_r($flat('#Prod fieldset div table'));