如何使用 LazyCollection 验证大量数据 Laravel

How to Validate huge data using LazyCollection Laravel

我正在尝试使用 Laravel LazyCollection 验证大量数据, 我用 15000 行测试代码,每行包含 9 列要验证。

场景是用户上传excel文件,然后将其转换为数组,然后开始验证数据

控制者:

class ImportBudget extends Controller
{
    use SpreadsheetTrait;

    public function import(Request $request) 
    {
        // File format validation
        $validatedFile = SpreadsheetTrait::fileHandler($request);
        
        if (!is_array($validatedFile)) return $validatedFile;

        $messages = [
            'required' => 'This field is required, please input.',
            'numeric' => 'Please input number value only.',
            'integer' => 'Please input integer value only.',
            'min' => 'Minimal input value is :min.',
            'required_if' => 'This field is required, please input.',
        ];
        
        // Cars data which contain 30k++ of arrays
        $car = Cache::get('car')->pluck('slug')->toArray();

        // Start Data Validation
        $validatedData = LazyCollection::make(function () use($validatedFile) {
            $data = collect($validatedFile);
            yield $data;
        })->chunk(1000)->each(function ($rows) use ($car, $messages) {
            return Validator::make($rows->toArray(), [
                '*.0' => ['required', function ($attribute, $value, $fail) {
                    if (!in_array($value , config('constants.purposes'))) {
                        $fail('The purpose field is invalid.');
                    }
                }],
                '*.1' => 'required_if:*.0,PRODUCTION-PROJECT',
                '*.2' => 'required',
                '*.3' => 'required',
                '*.4' => 'required',
                '*.5' => 'required',
                '*.6' => 'required',
                '*.7' => ['required', function ($attribute, $value, $fail) use($car) {
                    if (!in_array($value, $curr)) {
                        $fail('The car is invalid.');
                    }
                }],
                '*.8' => 'required|numeric|min:0',
                '*.9' => 'required|integer|min:1',
            ], $messages);
        });
    }
}

上面的代码导致错误最大执行次数:

{
    "message": "Maximum execution time of 60 seconds exceeded",
    "exception": "Symfony\Component\ErrorHandler\Error\FatalError",
    "file": "C:\laragon\www\bulus-jai\vendor\laravel\framework\src\Illuminate\Collections\Arr.php",
    "line": 115,
    "trace": []
}

即使我把执行时间加到120,结果还是一样

请注意,$car 变量包含 30k++ 数组,我认为这也会使验证速度变慢,但我不知道如何使其更简单。

最好的解决方案是什么?

更新 1

我尝试通过创建服务使用我自己的验证脚本进行切换,结果非常好(15k 行大约需要 5 ~ 10 秒):


class BatchValidationServices {

    public static function budgetValidation($validatedFile)
    {
        $requiredFields = [
            0 => true,
            1 => false,
            2 => true,
            3 => true,
            4 => true,
            5 => true,
            6 => true,
            7 => true,
            8 => true,
            9 => true,
            10 => false
        ];

        $curr = collect(Cache::get('curr'))
            ->where('term.status','BUDGETING')
            ->pluck('name')
            ->toArray();
        $item = Cache::get('item')->pluck('item_code')->toArray();
        $car = Cache::get('car')->pluck('slug')->toArray();
        $deliveryPlan = Cache::get('delivery');
        $orderPlan = Cache::get('order');

        $collectedData = LazyCollection::make(function () use($validatedFile) {
            $data = collect($validatedFile);
            yield $data;
        }); 

        $errors = [];

        $collectedData->chunk(1000)
            ->each(function ($collection)  use (
                $orderPlan,
                $deliveryPlan,
                $car,
                $curr,
                $item,
                $requiredFields, &$errors){
                foreach ($collection->toArray() as $array) {
                    foreach ($array as $key => $row) {
    
                        // Validate blank rows
                        for ($i=0; $i < count($requiredFields); $i++) {
                            if ($row[$i] === null &&
                            $requiredFields[$i] === true) {
                                array_push($errors, [$key.'.'.$i => 'This field is required.']);
                            }
                        }
    
                        // Validate purpose validity
                        if (!in_array($row[0], config('constants.purposes'))) array_push($errors, [$key.'.0' => 'Purpose is invalid.']);
    
                        // Validate required preparation item
                        $preparationItems = array_column(config('constants.preparations'), 'item');
    
                        if ($row[0] === 'PRODUCTION-PROJECT' && $row[1] === null) {
                            array_push($errors, [$key.'.1' => 'This field is required if the purpose is PRODUCTION-PROJECT.']);  
                        } elseif ($row[0] === 'PRODUCTION-PROJECT' && $row[1] !== null) {
                            if (!in_array($row[1], $preparationItems)) array_push($errors, [$key.'.1' => 'Production preparation item is invalid.']);
                        }
    
                        // Validate order plan & delivery plan 
                        if (!in_array($row[2], $orderPlan)) array_push($errors, [$key.'.2' => 'Order plan is invalid.']);
                        
                        if (!in_array($row[3], $deliveryPlan)) {
                            array_push($errors, [$key.'.3' => 'Delivery plan is invalid.']);
                        } else {
                            if ($row[3] < $row[2]) array_push($errors, [$key.'.3' => 'Delivery plan should be after or at least in the same period as order plan.']);
                        }
                        
                        // Validate destination-carline
                        if (!in_array($row[4], $car)) array_push($errors, [$key.'.4' => 'Destination-carline is invalid.']);

                        // Validate Origin
                        if (!in_array($row[5], ['DOMESTIC', 'IMPORT'])) array_push($errors, [$key.'.5' => 'Origin supplier is invalid, please choose between DOMESTIC or IMPORT only.']);

                        // Validate Item
                        if(!in_array($row[6], $item)) array_push($errors, [$key.'.6' => 'Item code is invalid.']);

                        // Validate Currency
                        if (!in_array($row[7], $curr)) {
                            array_push($errors, [$key.'.7' => 'Currency is invalid.']);
                        } else {
                            if ($row[5] === 'IMPORT' && $row[7] === 'IDR') array_push($errors, [$key.'.7' => 'IDR currency shouldn\'t be used for IMPORT.']);
                        }

                        // Validate Price
                        if (!is_numeric($row[8])) {
                            array_push($errors, [$key.'.8' => 'Please only input numerical value.']);
                        } else {
                            if ($row[8] <= 0) array_push($errors, [$key.'.8' => 'Value must be greater than 0.']);
                        }

                        // Validate Qty
                        if (!is_integer($row[9])) {
                            array_push($errors, [$key.'.9' => 'Please only input numerical value.']);
                        } else {
                            if ($row[9] <= 0) array_push($errors, [$key.'.9' => 'Value must be greater than 0.']);
                        }
                    }
                }

                return $errors;
            });

        if (count($errors) > 0) {
            return $errors;
        } else {
            return true;
        }
    }
}

但是我仍然想知道为什么当我使用内置的 Laravel 验证时,它需要这么长时间?我更喜欢使用 Laravel 验证,因为代码更具可读性。

既然您已经将电子表格的全部内容加载到 $validatedFile 变量中,为什么还要创建一个 LazyCollection 对象?它们的唯一目的是通过 而不是 将大型数据集加载到内存中来节省内存。您也可以清理使用闭包的验证规则。这不仅仅是表面上的改变:in_array() 是出了名的慢。

class ImportBudget extends Controller
{
    use SpreadsheetTrait;

    public function import(Request $request) 
    {
        // File format validation
        $validatedFile = SpreadsheetTrait::fileHandler($request);
        
        if (!is_array($validatedFile)) {
            // this should be throwing an exception of some kind
            return $validatedFile;
        }

        $purposes = config('constants.purposes');

        // Cars data which contain 30k++ of arrays
        $car = Cache::get('car')->pluck('slug');

        $rules = [
            '*.0' => ['required', Rule::in($purposes)],
            '*.1' => ['required_if:*.0,PRODUCTION-PROJECT'],
            '*.2' => ['required'],
            '*.3' => ['required'],
            '*.4' => ['required'],
            '*.5' => ['required'],
            '*.6' => ['required'],
            '*.7' => ['required', Rule::in($car)],
            '*.8' => ['required', 'numeric', 'min:0'],
            '*.9' => ['required', 'integer', 'min:1'],
        ];

        $messages = [
            'required' => 'This field is required, please input.',
            'numeric' => 'Please input number value only.',
            'integer' => 'Please input integer value only.',
            'min' => 'Minimal input value is :min.',
            'required_if' => 'This field is required, please input.',
        ];

        // Start Data Validation
        $validatedData = Validator::make($validatedFile, $rules, $messages));
    }
}

如果保证slug是唯一的,将其作为数组的索引可以提高速度:

$car = Cache::get('car')->pluck('id', 'slug');

然后你的验证规则变成了一个超级快速的闭包,只需要检查密钥是否存在:

'*.7' => ['required', fn ($k, $v, $f) => $car[$v] ?? $f("The car in $k is invalid")],