php-html-parser 如何跟随重定向

php-html-parser How to follow redirects

https://github.com/paquettg/php-html-parser 有人知道如何在这个库中跟踪重定向吗? 例如:

require "vendor/autoload.php";
use PHPHtmlParser\Dom;
$dom = new Dom;
$dom->loadFromUrl($html);

版本:

  • guzzlehttp/guzzle: "7.2.0"
  • paquettg/php-html-parser: "3.1.1"

为什么库本身不允许重定向?

loadFromUrl 方法具有以下签名(当时是 3.1.1)

    public function loadFromUrl(string $url, ?Options $options = null, ?ClientInterface $client = null, ?RequestInterface $request = null): Dom
    {
        if ($client === null) {
            $client = new Client();
        }
        if ($request === null) {
            $request = new Request('GET', $url);
        }

        $response = $client->sendRequest($request);
        $content = $response->getBody()->getContents();

        return $this->loadStr($content, $options);
    }

查看行 $response = $client->sendRequest($request); 它转到 Guzzle 的客户端 - https://github.com/guzzle/guzzle/blob/master/src/Client.php#L131

/**
* The HttpClient PSR (PSR-18) specify this method.
*
* @inheritDoc
*/
public function sendRequest(RequestInterface $request): ResponseInterface
{
   $options[RequestOptions::SYNCHRONOUS] = true;
   $options[RequestOptions::ALLOW_REDIRECTS] = false;
   $options[RequestOptions::HTTP_ERRORS] = false;

   return $this->sendAsync($request, $options)->wait();
}

$options[RequestOptions::ALLOW_REDIRECTS] = false; 会自动关闭重定向。无论您通过客户端或请求传递什么,它都会自动关闭重定向。

如何使用库进行重定向

观察方法 loadFromUrl 将发出请求并获得响应然后使用 loadStr 我们将模仿相同但使用 Guzzle(因为它是库的依赖项)。

<?php
// Include the autoloader
use GuzzleHttp\Client;
use GuzzleHttp\Exception\GuzzleException;
use PHPHtmlParser\Dom;

include_once("vendor/autoload.php");

$client = new Client();
try {
    // Showing the allow_redirects for verbosity sake. This is on by default with GuzzleHTTP clients.
    $request = $client->request('GET', 'http://theeasyapi.com', ['allow_redirects' => true]);

    // This would work exactly the same
    //$request = $client->request('GET', 'http://theeasyapi.com');
} catch(GuzzleException $e) {
    // Probably do something with $e
    var_dump($e->getMessage());
    exit;
}

$dom = new Dom();
$domExample = $dom->loadStr($request->getBody()->getContents());
foreach($domExample->find('a') as $link) {
    var_dump($link->text);
}

上面的代码将实例化一个新的 Guzzle 客户端,并向 URL 发出允许重定向的请求。此示例中使用的网站是一个将 301 从不安全重定向到安全的网站。