如何使用 symfony dom 爬虫解析 html table 到数组

How to parse html table to array with symfony dom crawler

我有 html table 并且我想从中创建数组 table

$html = '<table>
<tr>
    <td>satu</td>
    <td>dua</td>
</tr>
<tr>
    <td>tiga</td>
    <td>empat</td>
</tr>
</table>

我的数组必须如下所示

array(
   array(
      "satu",
      "dua",
   ),
   array(
     "tiga",
     "empat",
   )
)

我尝试了下面的代码,但无法获得我需要的数组

$crawler = new Crawler();
$crawler->addHTMLContent($html);
$row = array();
$tr_elements = $crawler->filterXPath('//table/tr');
foreach ($tr_elements as $tr) {
 // ???????
}
$html = '<table>
            <tr>
                <td>satu</td>
                <td>dua</td>
            </tr>
            <tr>
                <td>tiga</td>
                <td>empat</td>
            </tr>
            </table>';

    $crawler = new Crawler();
    $crawler->addHTMLContent($html);
    $rows = array();
    $tr_elements = $crawler->filterXPath('//table/tr');
    // iterate over filter results
    foreach ($tr_elements as $i => $content) {
        $tds = array();
        // create crawler instance for result
        $crawler = new Crawler($content);
        //iterate again
        foreach ($crawler->filter('td') as $i => $node) {
           // extract the value
            $tds[] = $node->nodeValue;

        }
        $rows[] = $tds;

    }
    var_dump($rows );exit;

会显示

array 
  0 => 
    array 
      0 => string 'satu' 
      1 => string 'dua' 
  1 => 
    array (size=2)
      0 => string 'tiga' 
      1 => string 'empat'
$table = $crawler->filter('table')->filter('tr')->each(function ($tr, $i) {
    return $tr->filter('td')->each(function ($td, $i) {
        return trim($td->text());
    });
});

print_r($table);

以上示例将为您提供一个多维数组,其中第一层是 table 行 "tr",第二层是 table 列 "td"。

编辑

如果您有嵌套的 tables,此代码会将它们很好地展平为一维数组。

$html = 'MY HTML HERE';
$crawler = new Crawler($html);

$flat = function(string $selector) use ($crawler) {
    $result = [];
    $crawler->filter($selector)->each(function ($table, $i) use (&$result) {
        $table->filter('tr')->each(function ($tr, $i) use (&$result) {
            $tr->filter('td')->each(function ($td, $i) use (&$result) {
                $html = trim($td->html());
                if (strpos($html, '<table') !== FALSE) return;

                $iterator = $td->getIterator()->getArrayCopy()[0];
                $address = $iterator->getNodePath();

                if (!empty($html)) $result[$address] = $html;
            });
        });
    });
    return $result;
};

// The selector gotta point to the most outwards table.
print_r($flat('#Prod fieldset div table'));