php-html-parser 如何跟随重定向
php-html-parser How to follow redirects
https://github.com/paquettg/php-html-parser
有人知道如何在这个库中跟踪重定向吗?
例如:
require "vendor/autoload.php";
use PHPHtmlParser\Dom;
$dom = new Dom;
$dom->loadFromUrl($html);
版本:
- guzzlehttp/guzzle: "7.2.0"
- paquettg/php-html-parser: "3.1.1"
为什么库本身不允许重定向?
loadFromUrl
方法具有以下签名(当时是 3.1.1)
public function loadFromUrl(string $url, ?Options $options = null, ?ClientInterface $client = null, ?RequestInterface $request = null): Dom
{
if ($client === null) {
$client = new Client();
}
if ($request === null) {
$request = new Request('GET', $url);
}
$response = $client->sendRequest($request);
$content = $response->getBody()->getContents();
return $this->loadStr($content, $options);
}
查看行 $response = $client->sendRequest($request);
它转到 Guzzle 的客户端 - https://github.com/guzzle/guzzle/blob/master/src/Client.php#L131
/**
* The HttpClient PSR (PSR-18) specify this method.
*
* @inheritDoc
*/
public function sendRequest(RequestInterface $request): ResponseInterface
{
$options[RequestOptions::SYNCHRONOUS] = true;
$options[RequestOptions::ALLOW_REDIRECTS] = false;
$options[RequestOptions::HTTP_ERRORS] = false;
return $this->sendAsync($request, $options)->wait();
}
$options[RequestOptions::ALLOW_REDIRECTS] = false;
会自动关闭重定向。无论您通过客户端或请求传递什么,它都会自动关闭重定向。
如何使用库进行重定向
观察方法 loadFromUrl
将发出请求并获得响应然后使用 loadStr
我们将模仿相同但使用 Guzzle(因为它是库的依赖项)。
<?php
// Include the autoloader
use GuzzleHttp\Client;
use GuzzleHttp\Exception\GuzzleException;
use PHPHtmlParser\Dom;
include_once("vendor/autoload.php");
$client = new Client();
try {
// Showing the allow_redirects for verbosity sake. This is on by default with GuzzleHTTP clients.
$request = $client->request('GET', 'http://theeasyapi.com', ['allow_redirects' => true]);
// This would work exactly the same
//$request = $client->request('GET', 'http://theeasyapi.com');
} catch(GuzzleException $e) {
// Probably do something with $e
var_dump($e->getMessage());
exit;
}
$dom = new Dom();
$domExample = $dom->loadStr($request->getBody()->getContents());
foreach($domExample->find('a') as $link) {
var_dump($link->text);
}
上面的代码将实例化一个新的 Guzzle 客户端,并向 URL 发出允许重定向的请求。此示例中使用的网站是一个将 301 从不安全重定向到安全的网站。
https://github.com/paquettg/php-html-parser 有人知道如何在这个库中跟踪重定向吗? 例如:
require "vendor/autoload.php";
use PHPHtmlParser\Dom;
$dom = new Dom;
$dom->loadFromUrl($html);
版本:
- guzzlehttp/guzzle: "7.2.0"
- paquettg/php-html-parser: "3.1.1"
为什么库本身不允许重定向?
loadFromUrl
方法具有以下签名(当时是 3.1.1)
public function loadFromUrl(string $url, ?Options $options = null, ?ClientInterface $client = null, ?RequestInterface $request = null): Dom
{
if ($client === null) {
$client = new Client();
}
if ($request === null) {
$request = new Request('GET', $url);
}
$response = $client->sendRequest($request);
$content = $response->getBody()->getContents();
return $this->loadStr($content, $options);
}
查看行 $response = $client->sendRequest($request);
它转到 Guzzle 的客户端 - https://github.com/guzzle/guzzle/blob/master/src/Client.php#L131
/**
* The HttpClient PSR (PSR-18) specify this method.
*
* @inheritDoc
*/
public function sendRequest(RequestInterface $request): ResponseInterface
{
$options[RequestOptions::SYNCHRONOUS] = true;
$options[RequestOptions::ALLOW_REDIRECTS] = false;
$options[RequestOptions::HTTP_ERRORS] = false;
return $this->sendAsync($request, $options)->wait();
}
$options[RequestOptions::ALLOW_REDIRECTS] = false;
会自动关闭重定向。无论您通过客户端或请求传递什么,它都会自动关闭重定向。
如何使用库进行重定向
观察方法 loadFromUrl
将发出请求并获得响应然后使用 loadStr
我们将模仿相同但使用 Guzzle(因为它是库的依赖项)。
<?php
// Include the autoloader
use GuzzleHttp\Client;
use GuzzleHttp\Exception\GuzzleException;
use PHPHtmlParser\Dom;
include_once("vendor/autoload.php");
$client = new Client();
try {
// Showing the allow_redirects for verbosity sake. This is on by default with GuzzleHTTP clients.
$request = $client->request('GET', 'http://theeasyapi.com', ['allow_redirects' => true]);
// This would work exactly the same
//$request = $client->request('GET', 'http://theeasyapi.com');
} catch(GuzzleException $e) {
// Probably do something with $e
var_dump($e->getMessage());
exit;
}
$dom = new Dom();
$domExample = $dom->loadStr($request->getBody()->getContents());
foreach($domExample->find('a') as $link) {
var_dump($link->text);
}
上面的代码将实例化一个新的 Guzzle 客户端,并向 URL 发出允许重定向的请求。此示例中使用的网站是一个将 301 从不安全重定向到安全的网站。