如何从DOMElement php scraping 获取兄弟DOMNode？

Question

我正在与 PHP 合作，我正在尝试废弃一小部分代码，但目前我不知道如何废弃。下面是我不能废的简单结构，双引号里面的文字。

<strong>Palabras: </strong>
<br>            
"   
     Biometría,          
     Análisis de textura,                    
     Wavelets,        
     Codificación predictiva,  
     Reconocimiento de patrones,                      
     Filtros Bidimensionales de Gabor, 
"                   
<br>

原文为here:

Producción bibliográfica - Artículo - Publicado en revista especializada
some name,another name, "E-Learning y Espacios Colaborativos" . En: CountryName
ISSN: ed:
v. fasc. p. - ,2006 
Palabras: 
E-learning, Espacios Colaborativos, 
Sectores: 
Educación,

这是我尝试废弃双引号内的文本

 //getting Palabras text content
  $list = $doc->getElementsByTagName('strong');
  foreach($list as $node)
  {
      if( $node->nodeValue == "Palabras: " )
      {
         //what can I do here to get the double quotations content
      }
  }

如果比较结果为真$node->nodeValue == "Palabras: "我尝试像这样获取"brother"节点的内容：

if( $node->nodeValue == "Palabras: " )
{
    $nodeValue = $node->nextSibling->nodeValue;
}

但是如果我尝试这样做，我会得到一个错误，其中的问题是 $node->nextSibling 是 DOMElement，因此 $node->nextSibling 没有属性 nodeValue。

那么我怎样才能得到 "brother" DOMNode 呢？

注：

为什么我不调用 $doc->getElementsByTagName('br') 而是 $doc->getElementsByTagName('strong') 因为网页中有很多 br 标签，但我只需要 <strong>Palabras: </strong> 之后的文本（这是唯一识别双引号内文本内容的标签），我不打算在它们之间找到 br 标签

Answer 1

您可以使用 XPath 表达式查找 <strong>Palabras: </strong> 然后第一个不完全由空格组成的同级文本节点。

示例：

$xpath = new DOMXPath($doc);
$query = '//strong[.="Palabras: "]/following-sibling::text()[normalize-space()][1]';

foreach ($xpath->query($query) as $node) {
    echo $node->nodeValue;
}

如何从DOMElement php scraping 获取兄弟DOMNode？

How to get the brother DOMNode from DOMElement php scraping?

php

domdocument

web-scraping

示例：