如何使用 DOMparser 在 div 中进行网页抓取

Question

我正在尝试获取 div 和其他页面，试图将其放入 foreach 中。但是面对一些麻烦，

<div class="article_info">
    <ul class="c-result_box">
     <li>
      <div class="inner cf">
       <div class="c-header">
         <div class="c-logo"> 
           <im src="/e/designs/31sumai/common/img/logo_08.png" alt="#"> 
             </div>
               <p class="c-supplier">三井のマンション</p>
                    <p class="c-name">
                        <a href="https://www.31sumai.com/mfr/K1503/" class="link" target="_blank">パークリュクス大阪天満</a>
                    </p>

我正在尝试获取 <a> 元素内的文本，这是我的代码，我在这里缺少什么？

$start_id = 1501;
while(true){

    $url = 'https://www.31sumai.com/mfr/K'.$start_id.'/outline.html';
    $html = file_get_contents($url);
    libxml_use_internal_errors(true);
    $DOMParser = new \DOMDocument();
    $DOMParser->loadHTML($html);
    $xpath = new \DOMXPath($DOMParser);

    $classname="c-name";
    $nodes = $finder->query("//*[contains(@class, '$classname')]");
    $MyTable = false; 
    $insertData = [];  
    foreach($nodes as $node){
        $allNames = [];
        foreach($node->getElementsByTagName('a') as $a){
            $name = $a->getElementsByTagName('a');
            $allProperties[] = [
                'names' => $name];
        }

    }

感谢您的帮助！

Answer 1

您可以依靠您的 XPath 查询来提取您想要的所有文本节点，然后在您的循环中获取 nodeValue 属性：

$start_id = "1501";
$url = "https://www.31sumai.com/mfr/K$start_id/outline.html";
$html = file_get_contents($url);
libxml_use_internal_errors(true);
$DOMParser = new \DOMDocument();
$DOMParser->loadHTML($html);
$xpath = new \DOMXPath($DOMParser);

$classname="c-name";

$nodes = $xpath->query("//*[contains(@class, '$classname')]/a/text()");
foreach($nodes as $node){
    echo $node->nodeValue;
}

如何使用 DOMparser 在 div 中进行网页抓取

How to web-scrape in in divs with DOMparser

html

php

xpath

domdocument

web-scraping