如何使用 php 简单 html dom 或 Curl 从 div 抓取 HTML 标签

How to Scrape HTML tags from a div using php simple html dom or Curl

这是我想做的一个例子 示例:

<div class='room'>
<h1>This is a h1</h1>
<p>This is a Paragraph</p>
<h2>This is h2</h2>
</div>

从上面的文件中,我想抓取数组中的数据和标签。 结果我想要一个包含以下内容的数组: arr = [h1,p,h2]; 和另一个数组: arr2 = [这是h1,这是段落,这是h2]

试试这个;

$str = "<div class='room'>
<h1>This is a h1</h1>
<p>This is a Paragraph</p>
<h2>This is h2</h2>
</div>";

$arr = explode(PHP_EOL, $str);

$res =array();
Foreach($arr as $row){
    If(!strpos($row, "div") !== False){
        $res[substr($row, 1, strpos($row, ">")-1)] = strip_tags($row); 
    }
}

Var_dump($res);

https://3v4l.org/8TkIT

它一次循环一行并创建具有命名键的数组。

编辑,如果有多个房间,你可以像这样把它变成多维的:
https://3v4l.org/DdXVd

$str = "<div class='room'>
<h1>This is a h1</h1>
<p>This is a Paragraph</p>
<h2>This is h2</h2>
</div>
<div class='room2'>
<h1>This is a h1</h1>
<p>This is a Paragraph</p>
<h2>This is h2</h2>
</div>";

$arr = explode(PHP_EOL, $str);

$res =array();
Foreach($arr as $row){
    If(strpos($row, "div") !== False){
        $pos1 = strpos($row, "'")+1;
        $room = substr($row, $pos1, strpos($row, "'", $pos1)-$pos1);
    }Else{
        $pos1 = strpos($row, "<")+1;
        $res[$room][substr($row, strpos($row, "<")+1, strpos($row, ">")-$pos1)] = trim(strip_tags($row)); 
    }
}

Var_dump($res);

假设元素已知,您可以像这样使用 domdocument's getelementsbytagname

$html = "<div class='room'>
<h1>This is a h1</h1>
<p>This is a Paragraph</p>
<h2>This is h2</h2>
</div>";
$doc = new DOMDocument();
$doc->loadhtML($html);
$elements = array();
$content = array();
function iterate_elements($array, $doc){
     global $elements, $content;
     foreach($array as $element){
          $the_element = $doc->getElementsByTagName($element);
          foreach($the_element as $target){
               $content[] = $target->textContent;
               //$target->tagName;         
          }
          if(!empty($the_element->length)) {
               $elements[] =  $element;
         }
     }
}
iterate_elements(array('h1','p', 'h2'), $doc);
print_r($elements);
print_r($content);

演示:https://eval.in/825860

试试下面的代码。

$html = "<div class='room'>
<h1>This is a h1</h1>
<p>This is a Paragraph</p>
<h2>This is h2</h2>
</div>";

$dom = new SimpleXMLElement( $html );

$values = array_filter( array_values( (array) $dom ), function ( $i ) { return ! is_array( $i ); } );
$keys = array_filter( array_keys( (array) $dom ), function ( $i ) { return $i != '@attributes'; } );

print_r( $values ); // This is a h1, This is a Paragraph, This is h2
print_r( $keys ); // h1, p, h2

我使用 array_filter 从结果中删除 div 标签。

$str = <<<EOF
<div class='room'>
<h1>This is a h1</h1>
<p>This is a Paragraph</p>
<h2>This is h2</h2>
</div>
EOF;

$html = str_get_html($str);

foreach($html->find('.room *') as $el){
  $arr[] = $el->tag;
  $arr2[] = $el->text();
}