Guzzle:使用 Guzzle 的 Pool:batch() 和 `sink` 选项并行下载文件

Guzzle: Parallel file download using Guzzle's Pool:batch() and `sink` option

您可以使用 Guzzle 的 Pool:batch() 方法并行执行 http 请求。它允许您使用第三个参数中的 options 键为请求设置默认选项。

但是如果我需要为池中的不同请求提供不同的选项怎么办?我想使用池执行 GET 请求并将每个响应流式传输到磁盘上的不同文件。有一个 sink 选项。但是如何将此选项的不同值应用于请求?

您可以在请求中单独指定 $options。如果您将它传递给客户端,它只会适用于所有请求。这是 Guzzle 6 文档的摘录:

Headers may be added as default options when creating a client. When headers are used as default options, they are only applied if the request being created does not already contain the specific header. This include both requests passed to the client in the send() and sendAsync() methods and requests created by the client (e.g., request() and requestAsync()).

http://guzzle.readthedocs.org/en/latest/request-options.html?highlight=default#headers

喷口 6

$client = new \GuzzleHttp\Client();

$requests = function ($total) use ($client) {
    for ($i = 0; $i < $total; $i++) {
        $url = "http://domain.com/picture/{$i}.jpg";
        $filepath = "/tmp/{$i}.jpg";

        yield function() use ($client, $url, $filepath) {
            return $client->getAsync($url, [
                'sink' => $filepath
            ]);
        };
    }
};

$pool = new Pool($client, $requests(100));

几乎是正确的,但是如果您想向 Pool() 构造函数提供“选项”,它的实现是不正确的。

他缺少提到的池选项数组的关键实现 here

Guzzle 文档说:

When a function is yielded by the iterator, the function is provided the "request_options" array that should be merged on top of any existing options, and the function MUST then return a wait-able promise.

此外,如果您查看我链接到的评论下方的 Pool() 代码,您可以看到 Guzzle 的 Pool 调用了可调用对象并将其作为 Pool 的“选项”作为参数,正是这样,您应该将其应用于您的请求。

正确的优先级是

Per-request options > Pool options > Client defaults.

如果您不将 Pool() 对象的选项数组应用于您的请求对象,您最终会遇到严重的错误,例如您尝试制作 new Pool($client, $requests(100), ['options'=>['timeout'=>30.0]]);。如果没有我更正的代码,您的 Pool-options 根本不会被应用,因为 you 不支持正确合并 pool 选项,因此最终只是丢弃它们。

所以这是支持 Pool() 选项的正确代码:

<?php

$client = new \GuzzleHttp\Client();

$requests = function ($total) use ($client) {
    for ($i = 0; $i < $total; $i++) {
        $url = "domain.com/picture/{$i}.jpg";
        $filepath = "/tmp/{$i}.jpg";

        yield function($poolOpts) use ($client, $url, $filepath) {
            /** Apply options as follows:
             * Client() defaults are given the lowest priority
             * (they're used for any values you don't specify on
             * the request or the pool). The Pool() "options"
             * override the Client defaults. And the per-request
             * options ($reqOpts) override everything (both the
             * Pool and the Client defaults).
             * In short: Per-Request > Pool Defaults > Client Defaults.
             */
            $reqOpts = [
                'sink' => $filepath
            ];
            if (is_array($poolOpts) && count($poolOpts) > 0) {
                $reqOpts = array_merge($poolOpts, $reqOpts); // req > pool
            }
            
            return $client->getAsync($url, $reqOpts);
        };
    }
};

$pool = new Pool($client, $requests(100));

但是请注意,如果您知道永远不会向 new Pool() 构造函数添加任何选项,则不必支持 Pool() 选项。在这种情况下,您可以查看 the official Guzzle docs 作为示例。

官方示例如下:

// Using a closure that will return a promise once the pool calls the closure.
$client = new Client();

$requests = function ($total) use ($client) {
    $uri = '127.0.0.1:8126/guzzle-server/perf';
    for ($i = 0; $i < $total; $i++) {
        yield function() use ($client, $uri) {
            return $client->getAsync($uri);
        };
    }
};

$pool = new Pool($client, $requests(100));