DOMXPath

Question

我想使用 php 从不同的 url 获取特定的 js 对象。

或

我想使用 php 从不同的 url 获取 js 脚本文本。

我正在使用这种方法。

$html = file_get_contents($url);
$ddoc = new DOMDocument();
libxml_use_internal_errors(TRUE);
if(!empty($html)){ //if any html is actually returned
$ddoc->loadHTML($html);
libxml_clear_errors(); //remove errors for yucky html
$xxpath = new DOMXPath($ddoc);
$rrrow = $xxpath->query("//script[contains(@src, 'pcode')]");

}

Answer 1

您忽略了说明您的代码发生了什么（如果有的话）。我实际上尝试了一种相同的方法并且它工作得很好（见下文）所以在不知道你试图瞄准的 url 的情况下我建议你尝试向 file_get_contents 添加上下文，因为在许多情况下, 服务器可以配置为拒绝不存在 User-Agent 字符串的请求。

$url='http://beautifulbathrooms.tumblr.com/';
$query='//script[contains(@src,"jquery")]';



$dom=new DOMDocument;
$dom->validateOnParse=false;
$dom->standalone=true;
$dom->preserveWhiteSpace=true;
$dom->strictErrorChecking=false;
$dom->substituteEntities=false;
$dom->recover=true;
$dom->formatOutput=false;
$dom->loadHTML( file_get_contents( $url ) );
libxml_clear_errors();


$xp=new DOMXPath( $dom );
$col=$xp->query( $query );
if( !empty( $col ) ){
    foreach( $col as $script ) echo $script->getAttribute('src').BR;
}

使用 file_get_contents

的上下文参数

$url='http://beautifulbathrooms.tumblr.com/';
$query='//script[contains(@src,"jquery")]';

$args=array(
    'http'=>array(
        'method' => 'GET',
        'header' => implode( "\n", array(
                    'User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:44.0) Gecko/20100101 Firefox/44.0',
                    'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
                    'Host: beautifulbathrooms.tumblr.com'
                )
            )
        )
    );
/* create the context */
$context=stream_context_create( $args );


$dom=new DOMDocument;
$dom->validateOnParse=false;
$dom->standalone=true;
$dom->preserveWhiteSpace=true;
$dom->strictErrorChecking=false;
$dom->substituteEntities=false;
$dom->recover=true;
$dom->formatOutput=false;
$dom->loadHTML( file_get_contents( $url, FILE_TEXT, $context ) );
libxml_clear_errors();


$xp=new DOMXPath( $dom );
$col=$xp->query( $query );
if( !empty( $col ) ){
    foreach( $col as $script ) echo $script->getAttribute('src').BR;
}

DOMXPath - 我们如何使用 php 搜索 js 对象

DOMXPath - How can we search a js object using php

php

web-scraping