laravel游标和laravel块方法有什么区别?

What is the difference between laravel cursor and laravel chunk method?

我想知道laravel块和laravel游标方法有什么区别。哪种方法更适合使用?它们的用例是什么?我知道你应该使用游标来节省内存,但它在后端是如何工作的呢?

详细解释和示例会很有用,因为我在 Whosebug 和其他网站上进行了搜索,但没有找到太多信息。

这是 laravel 文档中的代码片段。

分块结果

Flight::chunk(200, function ($flights) {
    foreach ($flights as $flight) {
        //
    }
});

使用光标

foreach (Flight::where('foo', 'bar')->cursor() as $flight) {
    //
}

确实这个问题可能会吸引一些自以为是的答案,但是简单的答案就在这里 Laravel Docs

仅供参考:

这是块:

If you need to process thousands of Eloquent records, use the chunk command. The chunk method will retrieve a "chunk" of Eloquent models, feeding them to a given Closure for processing. Using the chunk method will conserve memory when working with large result sets:

这是光标:

The cursor method allows you to iterate through your database records using a cursor, which will only execute a single query. When processing large amounts of data, the cursor method may be used to greatly reduce your memory usage:

块从数据库中检索记录,并将其加载到内存中,同时将游标设置在检索到的最后一条记录上,这样就不会发生冲突。

所以这里的好处是如果你想在large记录发出之前重新格式化,或者你想每次对第n条记录进行操作那么这很有用。一个例子是,如果你正在构建一个视图 out/excel sheet,那么你可以记录计数直到它们完成,这样它们就不会立即加载到内存中,从而影响内存限制。

游标使用 PHP 生成器,您可以查看 php generators 页面,但这里有一个有趣的标题:

A generator allows you to write code that uses foreach to iterate over a set of data without needing to build an array in memory, which may cause you to exceed a memory limit, or require a considerable amount of processing time to generate. Instead, you can write a generator function, which is the same as a normal function, except that instead of returning once, a generator can yield as many times as it needs to in order to provide the values to be iterated over.

虽然我不能保证我完全理解 Cursor 的概念,但是对于 Chunk,chunk 在每个记录大小上运行查询,检索它,并将它传递到闭包中以进一步处理记录。

希望这有用。

chunk是基于分页的,它维护一个页码,并为你做循环。

例如DB::table('users')->select('*')->chunk(100, function($e) {})会做多次查询直到结果集小于chunk size(100):

select * from `users` limit 100 offset 0;
select * from `users` limit 100 offset 100;
select * from `users` limit 100 offset 200;
select * from `users` limit 100 offset 300;
select * from `users` limit 100 offset 400;
...

cursor 基于 PDOStatement::fetch 和生成器。

$cursor = DB::table('users')->select('*')->cursor()
foreach ($cursor as $e) { }

将发出单个查询:

select * from `users`

但是驱动程序不会立即获取结果集。

我使用游标和 where

做了一些基准测试
foreach (\App\Models\Category::where('type','child')->get() as $res){

}

foreach (\App\Models\Category::where('type', 'child')->cursor() as $flight) {
    //
}

return view('welcome');

结果如下:

我们有一个比较:chunk() vs cursor()

  • cursor(): 高速
  • chunk():常量内存使用

10,000 条记录:

+-------------+-----------+------------+
|             | Time(sec) | Memory(MB) |
+-------------+-----------+------------+
| get()       |      0.17 |         22 |
| chunk(100)  |      0.38 |         10 |
| chunk(1000) |      0.17 |         12 |
| cursor()    |      0.16 |         14 |
+-------------+-----------+------------+

100,000 条记录:

+--------------+------------+------------+
|              | Time(sec)  | Memory(MB) |
+--------------+------------+------------+
| get()        |        0.8 |     132    |
| chunk(100)   |       19.9 |      10    |
| chunk(1000)  |        2.3 |      12    |
| chunk(10000) |        1.1 |      34    |
| cursor()     |        0.5 |      45    |
+--------------+------------+------------+
  • TestData:Laravel 默认迁移的用户 table
  • Homestead 0.5.0
  • PHP 7.0.12
  • MySQL 5.7.16
  • Laravel 5.3.22

Cursor()

  • 仅单个查询
  • 通过调用获取结果PDOStatement::fetch()
  • 默认情况下使用缓冲查询并一次获取所有结果。
  • 仅将当前行转换为 eloquent 模型

优点

  • 最小化 eloquent 模型内存开销
  • 易于操作

缺点

  • 巨大的结果导致 内存不足
  • 缓冲或非缓冲是一种权衡

Chunk()

  • 将查询分块到具有限制和偏移量的查询中
  • 通过调用获取结果PDOStatement::fetchAll
  • 将结果批量转换为 eloquent 个模型

优点

  • 可控的已用内存大小

缺点

  • 将结果批量转入 eloquent 模型可能会导致一些内存开销
  • 查询和内存使用是一种权衡

TL;DR

我以前以为cursor()每次查询都会在内存中只保留一行结果。所以当我看到@mohammad-asghari 的比较 table 时,我真的很困惑。一定是幕后的一些缓冲区

通过跟踪Laravel代码如下

/**
 * Run a select statement against the database and returns a generator.
 *
 * @param  string  $query
 * @param  array  $bindings
 * @param  bool  $useReadPdo
 * @return \Generator
 */
public function cursor($query, $bindings = [], $useReadPdo = true)
{
    $statement = $this->run($query, $bindings, function ($query, $bindings) use ($useReadPdo) {
        if ($this->pretending()) {
            return [];
        }

        // First we will create a statement for the query. Then, we will set the fetch
        // mode and prepare the bindings for the query. Once that's done we will be
        // ready to execute the query against the database and return the cursor.
        $statement = $this->prepared($this->getPdoForSelect($useReadPdo)
                          ->prepare($query));

        $this->bindValues(
            $statement, $this->prepareBindings($bindings)
        );

        // Next, we'll execute the query against the database and return the statement
        // so we can return the cursor. The cursor will use a PHP generator to give
        // back one row at a time without using a bunch of memory to render them.
        $statement->execute();

        return $statement;
    });

    while ($record = $statement->fetch()) {
        yield $record;
    }
}

我理解 Laravel 通过 wrap PDOStatement::fetch() 构建此功能。 通过搜索 buffer PDO fetchMySQL,我找到了这个文档。

https://www.php.net/manual/en/mysqlinfo.concepts.buffering.php

Queries are using the buffered mode by default. This means that query results are immediately transferred from the MySQL Server to PHP and then are kept in the memory of the PHP process.

所以通过 PDOStatement::execute() 我们实际上获取 整个结果行 并且 存储在内存中 ,而不是只有一排。所以如果结果太大,这将导致内存不足异常。

虽然文档显示我们可以使用 $pdo->setAttribute(PDO::MYSQL_ATTR_USE_BUFFERED_QUERY, false); 来摆脱缓冲查询。但缺点要慎重

Unbuffered MySQL queries execute the query and then return a resource while the data is still waiting on the MySQL server for being fetched. This uses less memory on the PHP-side, but can increase the load on the server. Unless the full result set was fetched from the server no further queries can be sent over the same connection. Unbuffered queries can also be referred to as "use result".

游标方法使用惰性集合,但只运行一次查询。

https://laravel.com/docs/6.x/collections#lazy-collections

However, the query builder's cursor method returns a LazyCollection instance. This allows you to still only run a single query against the database but also only keep one Eloquent model loaded in memory at a time.

Chunk 多次运行查询,一次将 chunk 的每个结果加载到 Eloquent 个模型中。

假设你在数据库中有一百万条记录。 可能这会给出最好的结果。 你可以使用类似的东西。有了它,您将使用分块的 LazyCollections。

User::cursor()->chunk(10000);

最好看一下源代码。

select() 或 get()

https://github.com/laravel/framework/blob/8.x/src/Illuminate/Database/Connection.php#L366

return $statement->fetchAll();

它使用 fetchAll 将所有记录加载到内存中。这很快但消耗大量内存。

游标()

https://github.com/laravel/framework/blob/8.x/src/Illuminate/Database/Connection.php#L403

while ($record = $statement->fetch()) {
   yield $record;
}

它使用fetch, it loads only 1 record into memory from the buffer at a time. Note that it only executes one query though. Lower memory but slower, since it iterates one by one. (note that depending on your php configuration, the buffer can be either stored on php side or mysql. Read more here)

块()

https://github.com/laravel/framework/blob/8.x/src/Illuminate/Database/Concerns/BuildsQueries.php#L30

public function chunk($count, callable $callback)
{
    $this->enforceOrderBy();
    $page = 1;
    do {
        $results = $this->forPage($page, $count)->get();
        $countResults = $results->count();

        if ($countResults == 0) {
            break;
        }

        if ($callback($results, $page) === false) {
            return false;
        }

        unset($results);

        $page++;
    } while ($countResults == $count);

    return true;
}

使用许多较小的 fetchAll 调用(通过使用 get()),并根据块使用 limit 将大查询结果分解为较小的查询,从而尝试保持低内存您指定的尺寸。在某种程度上,它试图利用 get() 和 cursor() 的好处。

根据经验,我会说使用块,如果可以的话,甚至更好的 chunkById。 (块在大表上的性能很差,因为它使用 offset,chunkBy id 使用 limit)。

惰性()

在laravel 8中还有lazy(),它类似于chunk但语法更清晰(使用生成器)

https://laravel.com/docs/8.x/eloquent#streaming-results-lazily

foreach (Flight::lazy() as $flight) {
    //
}

In 和 chunk() 一样,只是你不需要回调,因为它使用 php 生成器。你也可以使用类似于 chunk.

的 lazyById()