如何使用 LazyCollection 验证大量数据 Laravel
How to Validate huge data using LazyCollection Laravel
我正在尝试使用 Laravel LazyCollection 验证大量数据,
我用 15000 行测试代码,每行包含 9 列要验证。
场景是用户上传excel文件,然后将其转换为数组,然后开始验证数据
控制者:
class ImportBudget extends Controller
{
use SpreadsheetTrait;
public function import(Request $request)
{
// File format validation
$validatedFile = SpreadsheetTrait::fileHandler($request);
if (!is_array($validatedFile)) return $validatedFile;
$messages = [
'required' => 'This field is required, please input.',
'numeric' => 'Please input number value only.',
'integer' => 'Please input integer value only.',
'min' => 'Minimal input value is :min.',
'required_if' => 'This field is required, please input.',
];
// Cars data which contain 30k++ of arrays
$car = Cache::get('car')->pluck('slug')->toArray();
// Start Data Validation
$validatedData = LazyCollection::make(function () use($validatedFile) {
$data = collect($validatedFile);
yield $data;
})->chunk(1000)->each(function ($rows) use ($car, $messages) {
return Validator::make($rows->toArray(), [
'*.0' => ['required', function ($attribute, $value, $fail) {
if (!in_array($value , config('constants.purposes'))) {
$fail('The purpose field is invalid.');
}
}],
'*.1' => 'required_if:*.0,PRODUCTION-PROJECT',
'*.2' => 'required',
'*.3' => 'required',
'*.4' => 'required',
'*.5' => 'required',
'*.6' => 'required',
'*.7' => ['required', function ($attribute, $value, $fail) use($car) {
if (!in_array($value, $curr)) {
$fail('The car is invalid.');
}
}],
'*.8' => 'required|numeric|min:0',
'*.9' => 'required|integer|min:1',
], $messages);
});
}
}
上面的代码导致错误最大执行次数:
{
"message": "Maximum execution time of 60 seconds exceeded",
"exception": "Symfony\Component\ErrorHandler\Error\FatalError",
"file": "C:\laragon\www\bulus-jai\vendor\laravel\framework\src\Illuminate\Collections\Arr.php",
"line": 115,
"trace": []
}
即使我把执行时间加到120,结果还是一样
请注意,$car 变量包含 30k++ 数组,我认为这也会使验证速度变慢,但我不知道如何使其更简单。
最好的解决方案是什么?
更新 1
我尝试通过创建服务使用我自己的验证脚本进行切换,结果非常好(15k 行大约需要 5 ~ 10 秒):
class BatchValidationServices {
public static function budgetValidation($validatedFile)
{
$requiredFields = [
0 => true,
1 => false,
2 => true,
3 => true,
4 => true,
5 => true,
6 => true,
7 => true,
8 => true,
9 => true,
10 => false
];
$curr = collect(Cache::get('curr'))
->where('term.status','BUDGETING')
->pluck('name')
->toArray();
$item = Cache::get('item')->pluck('item_code')->toArray();
$car = Cache::get('car')->pluck('slug')->toArray();
$deliveryPlan = Cache::get('delivery');
$orderPlan = Cache::get('order');
$collectedData = LazyCollection::make(function () use($validatedFile) {
$data = collect($validatedFile);
yield $data;
});
$errors = [];
$collectedData->chunk(1000)
->each(function ($collection) use (
$orderPlan,
$deliveryPlan,
$car,
$curr,
$item,
$requiredFields, &$errors){
foreach ($collection->toArray() as $array) {
foreach ($array as $key => $row) {
// Validate blank rows
for ($i=0; $i < count($requiredFields); $i++) {
if ($row[$i] === null &&
$requiredFields[$i] === true) {
array_push($errors, [$key.'.'.$i => 'This field is required.']);
}
}
// Validate purpose validity
if (!in_array($row[0], config('constants.purposes'))) array_push($errors, [$key.'.0' => 'Purpose is invalid.']);
// Validate required preparation item
$preparationItems = array_column(config('constants.preparations'), 'item');
if ($row[0] === 'PRODUCTION-PROJECT' && $row[1] === null) {
array_push($errors, [$key.'.1' => 'This field is required if the purpose is PRODUCTION-PROJECT.']);
} elseif ($row[0] === 'PRODUCTION-PROJECT' && $row[1] !== null) {
if (!in_array($row[1], $preparationItems)) array_push($errors, [$key.'.1' => 'Production preparation item is invalid.']);
}
// Validate order plan & delivery plan
if (!in_array($row[2], $orderPlan)) array_push($errors, [$key.'.2' => 'Order plan is invalid.']);
if (!in_array($row[3], $deliveryPlan)) {
array_push($errors, [$key.'.3' => 'Delivery plan is invalid.']);
} else {
if ($row[3] < $row[2]) array_push($errors, [$key.'.3' => 'Delivery plan should be after or at least in the same period as order plan.']);
}
// Validate destination-carline
if (!in_array($row[4], $car)) array_push($errors, [$key.'.4' => 'Destination-carline is invalid.']);
// Validate Origin
if (!in_array($row[5], ['DOMESTIC', 'IMPORT'])) array_push($errors, [$key.'.5' => 'Origin supplier is invalid, please choose between DOMESTIC or IMPORT only.']);
// Validate Item
if(!in_array($row[6], $item)) array_push($errors, [$key.'.6' => 'Item code is invalid.']);
// Validate Currency
if (!in_array($row[7], $curr)) {
array_push($errors, [$key.'.7' => 'Currency is invalid.']);
} else {
if ($row[5] === 'IMPORT' && $row[7] === 'IDR') array_push($errors, [$key.'.7' => 'IDR currency shouldn\'t be used for IMPORT.']);
}
// Validate Price
if (!is_numeric($row[8])) {
array_push($errors, [$key.'.8' => 'Please only input numerical value.']);
} else {
if ($row[8] <= 0) array_push($errors, [$key.'.8' => 'Value must be greater than 0.']);
}
// Validate Qty
if (!is_integer($row[9])) {
array_push($errors, [$key.'.9' => 'Please only input numerical value.']);
} else {
if ($row[9] <= 0) array_push($errors, [$key.'.9' => 'Value must be greater than 0.']);
}
}
}
return $errors;
});
if (count($errors) > 0) {
return $errors;
} else {
return true;
}
}
}
但是我仍然想知道为什么当我使用内置的 Laravel 验证时,它需要这么长时间?我更喜欢使用 Laravel 验证,因为代码更具可读性。
既然您已经将电子表格的全部内容加载到 $validatedFile
变量中,为什么还要创建一个 LazyCollection
对象?它们的唯一目的是通过 而不是 将大型数据集加载到内存中来节省内存。您也可以清理使用闭包的验证规则。这不仅仅是表面上的改变:in_array()
是出了名的慢。
class ImportBudget extends Controller
{
use SpreadsheetTrait;
public function import(Request $request)
{
// File format validation
$validatedFile = SpreadsheetTrait::fileHandler($request);
if (!is_array($validatedFile)) {
// this should be throwing an exception of some kind
return $validatedFile;
}
$purposes = config('constants.purposes');
// Cars data which contain 30k++ of arrays
$car = Cache::get('car')->pluck('slug');
$rules = [
'*.0' => ['required', Rule::in($purposes)],
'*.1' => ['required_if:*.0,PRODUCTION-PROJECT'],
'*.2' => ['required'],
'*.3' => ['required'],
'*.4' => ['required'],
'*.5' => ['required'],
'*.6' => ['required'],
'*.7' => ['required', Rule::in($car)],
'*.8' => ['required', 'numeric', 'min:0'],
'*.9' => ['required', 'integer', 'min:1'],
];
$messages = [
'required' => 'This field is required, please input.',
'numeric' => 'Please input number value only.',
'integer' => 'Please input integer value only.',
'min' => 'Minimal input value is :min.',
'required_if' => 'This field is required, please input.',
];
// Start Data Validation
$validatedData = Validator::make($validatedFile, $rules, $messages));
}
}
如果保证slug
是唯一的,将其作为数组的索引可以提高速度:
$car = Cache::get('car')->pluck('id', 'slug');
然后你的验证规则变成了一个超级快速的闭包,只需要检查密钥是否存在:
'*.7' => ['required', fn ($k, $v, $f) => $car[$v] ?? $f("The car in $k is invalid")],
我正在尝试使用 Laravel LazyCollection 验证大量数据, 我用 15000 行测试代码,每行包含 9 列要验证。
场景是用户上传excel文件,然后将其转换为数组,然后开始验证数据
控制者:
class ImportBudget extends Controller
{
use SpreadsheetTrait;
public function import(Request $request)
{
// File format validation
$validatedFile = SpreadsheetTrait::fileHandler($request);
if (!is_array($validatedFile)) return $validatedFile;
$messages = [
'required' => 'This field is required, please input.',
'numeric' => 'Please input number value only.',
'integer' => 'Please input integer value only.',
'min' => 'Minimal input value is :min.',
'required_if' => 'This field is required, please input.',
];
// Cars data which contain 30k++ of arrays
$car = Cache::get('car')->pluck('slug')->toArray();
// Start Data Validation
$validatedData = LazyCollection::make(function () use($validatedFile) {
$data = collect($validatedFile);
yield $data;
})->chunk(1000)->each(function ($rows) use ($car, $messages) {
return Validator::make($rows->toArray(), [
'*.0' => ['required', function ($attribute, $value, $fail) {
if (!in_array($value , config('constants.purposes'))) {
$fail('The purpose field is invalid.');
}
}],
'*.1' => 'required_if:*.0,PRODUCTION-PROJECT',
'*.2' => 'required',
'*.3' => 'required',
'*.4' => 'required',
'*.5' => 'required',
'*.6' => 'required',
'*.7' => ['required', function ($attribute, $value, $fail) use($car) {
if (!in_array($value, $curr)) {
$fail('The car is invalid.');
}
}],
'*.8' => 'required|numeric|min:0',
'*.9' => 'required|integer|min:1',
], $messages);
});
}
}
上面的代码导致错误最大执行次数:
{
"message": "Maximum execution time of 60 seconds exceeded",
"exception": "Symfony\Component\ErrorHandler\Error\FatalError",
"file": "C:\laragon\www\bulus-jai\vendor\laravel\framework\src\Illuminate\Collections\Arr.php",
"line": 115,
"trace": []
}
即使我把执行时间加到120,结果还是一样
请注意,$car 变量包含 30k++ 数组,我认为这也会使验证速度变慢,但我不知道如何使其更简单。
最好的解决方案是什么?
更新 1
我尝试通过创建服务使用我自己的验证脚本进行切换,结果非常好(15k 行大约需要 5 ~ 10 秒):
class BatchValidationServices {
public static function budgetValidation($validatedFile)
{
$requiredFields = [
0 => true,
1 => false,
2 => true,
3 => true,
4 => true,
5 => true,
6 => true,
7 => true,
8 => true,
9 => true,
10 => false
];
$curr = collect(Cache::get('curr'))
->where('term.status','BUDGETING')
->pluck('name')
->toArray();
$item = Cache::get('item')->pluck('item_code')->toArray();
$car = Cache::get('car')->pluck('slug')->toArray();
$deliveryPlan = Cache::get('delivery');
$orderPlan = Cache::get('order');
$collectedData = LazyCollection::make(function () use($validatedFile) {
$data = collect($validatedFile);
yield $data;
});
$errors = [];
$collectedData->chunk(1000)
->each(function ($collection) use (
$orderPlan,
$deliveryPlan,
$car,
$curr,
$item,
$requiredFields, &$errors){
foreach ($collection->toArray() as $array) {
foreach ($array as $key => $row) {
// Validate blank rows
for ($i=0; $i < count($requiredFields); $i++) {
if ($row[$i] === null &&
$requiredFields[$i] === true) {
array_push($errors, [$key.'.'.$i => 'This field is required.']);
}
}
// Validate purpose validity
if (!in_array($row[0], config('constants.purposes'))) array_push($errors, [$key.'.0' => 'Purpose is invalid.']);
// Validate required preparation item
$preparationItems = array_column(config('constants.preparations'), 'item');
if ($row[0] === 'PRODUCTION-PROJECT' && $row[1] === null) {
array_push($errors, [$key.'.1' => 'This field is required if the purpose is PRODUCTION-PROJECT.']);
} elseif ($row[0] === 'PRODUCTION-PROJECT' && $row[1] !== null) {
if (!in_array($row[1], $preparationItems)) array_push($errors, [$key.'.1' => 'Production preparation item is invalid.']);
}
// Validate order plan & delivery plan
if (!in_array($row[2], $orderPlan)) array_push($errors, [$key.'.2' => 'Order plan is invalid.']);
if (!in_array($row[3], $deliveryPlan)) {
array_push($errors, [$key.'.3' => 'Delivery plan is invalid.']);
} else {
if ($row[3] < $row[2]) array_push($errors, [$key.'.3' => 'Delivery plan should be after or at least in the same period as order plan.']);
}
// Validate destination-carline
if (!in_array($row[4], $car)) array_push($errors, [$key.'.4' => 'Destination-carline is invalid.']);
// Validate Origin
if (!in_array($row[5], ['DOMESTIC', 'IMPORT'])) array_push($errors, [$key.'.5' => 'Origin supplier is invalid, please choose between DOMESTIC or IMPORT only.']);
// Validate Item
if(!in_array($row[6], $item)) array_push($errors, [$key.'.6' => 'Item code is invalid.']);
// Validate Currency
if (!in_array($row[7], $curr)) {
array_push($errors, [$key.'.7' => 'Currency is invalid.']);
} else {
if ($row[5] === 'IMPORT' && $row[7] === 'IDR') array_push($errors, [$key.'.7' => 'IDR currency shouldn\'t be used for IMPORT.']);
}
// Validate Price
if (!is_numeric($row[8])) {
array_push($errors, [$key.'.8' => 'Please only input numerical value.']);
} else {
if ($row[8] <= 0) array_push($errors, [$key.'.8' => 'Value must be greater than 0.']);
}
// Validate Qty
if (!is_integer($row[9])) {
array_push($errors, [$key.'.9' => 'Please only input numerical value.']);
} else {
if ($row[9] <= 0) array_push($errors, [$key.'.9' => 'Value must be greater than 0.']);
}
}
}
return $errors;
});
if (count($errors) > 0) {
return $errors;
} else {
return true;
}
}
}
但是我仍然想知道为什么当我使用内置的 Laravel 验证时,它需要这么长时间?我更喜欢使用 Laravel 验证,因为代码更具可读性。
既然您已经将电子表格的全部内容加载到 $validatedFile
变量中,为什么还要创建一个 LazyCollection
对象?它们的唯一目的是通过 而不是 将大型数据集加载到内存中来节省内存。您也可以清理使用闭包的验证规则。这不仅仅是表面上的改变:in_array()
是出了名的慢。
class ImportBudget extends Controller
{
use SpreadsheetTrait;
public function import(Request $request)
{
// File format validation
$validatedFile = SpreadsheetTrait::fileHandler($request);
if (!is_array($validatedFile)) {
// this should be throwing an exception of some kind
return $validatedFile;
}
$purposes = config('constants.purposes');
// Cars data which contain 30k++ of arrays
$car = Cache::get('car')->pluck('slug');
$rules = [
'*.0' => ['required', Rule::in($purposes)],
'*.1' => ['required_if:*.0,PRODUCTION-PROJECT'],
'*.2' => ['required'],
'*.3' => ['required'],
'*.4' => ['required'],
'*.5' => ['required'],
'*.6' => ['required'],
'*.7' => ['required', Rule::in($car)],
'*.8' => ['required', 'numeric', 'min:0'],
'*.9' => ['required', 'integer', 'min:1'],
];
$messages = [
'required' => 'This field is required, please input.',
'numeric' => 'Please input number value only.',
'integer' => 'Please input integer value only.',
'min' => 'Minimal input value is :min.',
'required_if' => 'This field is required, please input.',
];
// Start Data Validation
$validatedData = Validator::make($validatedFile, $rules, $messages));
}
}
如果保证slug
是唯一的,将其作为数组的索引可以提高速度:
$car = Cache::get('car')->pluck('id', 'slug');
然后你的验证规则变成了一个超级快速的闭包,只需要检查密钥是否存在:
'*.7' => ['required', fn ($k, $v, $f) => $car[$v] ?? $f("The car in $k is invalid")],