PHPExcel 1.8.0 内存在加载、块读取和迭代器时耗尽

PHPExcel 1.8.0 memory exhausted on load, chunk read, and iterator

我已经将我的 php 配置设置设置为上传 12800M 文件、无限大小的文件和无限上传时间以进行测试,但我一直卡在这个常见的 PHPExcel 致命内存耗尽错误上。我收到以下常见错误消息:

致命错误:在第 1220 行的.../Worksheet.php 中耗尽了 536870912 字节的允许内存大小(尝试分配 32 字节)。

当我使用块读取过滤器或迭代器时,随着 .xlsx 文件被进一步读取到行中,内存使用量会增加,而不是保持不变,我相信我从 PHPExcel 开发人员那里读到的一些报告。

我正在使用 PHPExcel 1.8.0。我可能会尝试旧版本,因为我读到读取大文件的性能更好。我从定期加载文件并将其读入数组开始,使用示例中的块读取过滤,并在此 URL: PHPExcel - memory leak when I go through all rows 处使用迭代器。我认为它是否是 1.8.0 或更早的版本并不重要。

<?php

/** Error reporting */
error_reporting(E_ALL);
ini_set('display_errors', TRUE);
ini_set('display_startup_errors', TRUE);
date_default_timezone_set('America/Los_Angeles');

define('EOL',(PHP_SAPI == 'cli') ? PHP_EOL : '<br />');

/** Include PHPExcel_IOFactory */
require_once dirname(__FILE__) . '/../Classes/PHPExcel/IOFactory.php';


if (!file_exists("Test.xlsx")) {
    exit("Please check if Test.xlsx exists first.\n");
}

echo date('H:i:s') , " Load workbook from Excel5 file" , EOL;
$callStartTime = microtime(true);

$objPHPExcel = PHPExcel_IOFactory::load("Order_Short.xls");

$callEndTime = microtime(true);
$callTime = $callEndTime - $callStartTime;

echo 'Call time to load Workbook was ' , sprintf('%.4f',$callTime) , " seconds" , EOL;
// Echo memory usage
echo date('H:i:s') , ' Current memory usage: ' , (memory_get_usage(true) / 1024 / 1024) , " MB" , EOL;


//http://runnable.com/Uot2A2l8VxsUAAAR/read-a-simple-2007-xlsx-excel-file-for-php

//  Read your Excel workbook
$inputFileName="Drybar Client Data for Tableau Test.xlsx";
$table = "mt_company2_project2_table144_raw";

try {
    echo date('H:i:s') , " Load workbook from Excel5 file" , EOL;
    $callStartTime = microtime(true);

    $inputFileType = PHPExcel_IOFactory::identify($inputFileName);
    $objReader = PHPExcel_IOFactory::createReader($inputFileType);
    $objPHPExcel = $objReader->load($inputFileName);

    $callEndTime = microtime(true);
    $callTime = $callEndTime - $callStartTime;

    echo 'Call time to load Workbook was ' , sprintf('%.4f',$callTime) , " seconds\r\n" , EOL;
    // Echo memory usage
    echo date('H:i:s') , ' Current memory usage: ' , (memory_get_usage(true) / 1024 / 1024) , " MB" , EOL;
} catch (Exception $e) {
    die('Error loading file "' . pathinfo($inputFileName, PATHINFO_BASENAME)
        . '": ' . $e->getMessage());
}

// Get worksheet dimensions
$sheet = $objPHPExcel->getSheet(0);

// http://asantillan.com/php-excel-import-to-mysql-using-phpexcel/
$worksheetTitle = $sheet->getTitle();

$highestRow = $sheet->getHighestRow();
$highestColumn = $sheet->getHighestColumn();
$highestColumnIndex = PHPExcel_Cell::columnIndexFromString($highestColumn);

// Calculationg Columns
$nrColumns = ord($highestColumn) - 64;

echo "File ".$worksheetTitle." has ";
echo $nrColumns . ' columns';
echo ' x ' . $highestRow . ' rows.<br />';

//  Loop through each row of the worksheet in turn
for ($row = 1; $row <= $highestRow; $row++) {
    //  Read a row of data into an array
    $rowData = $sheet->rangeToArray('A' . $row . ':' . $highestColumn . $row,
                                    NULL, FALSE, FALSE);

    var_dump($rowData);

    foreach ($rowData[0] as $k => $v) {
        echo "Row: " . $row . "- Col: " . ($k + 1) . " = " . $v . "<br />";
    }

}

我包括从 PHPExcel 修改的块读取过滤器 Reader 示例 #12 仍然给我相同的致命内存耗尽,因为它的内存使用量仍在增加,因为它继续向下读取行?

<?php

error_reporting(E_ALL);
set_time_limit(0);

define('EOL',(PHP_SAPI == 'cli') ? PHP_EOL : '<br />');

date_default_timezone_set('America/Los_Angeles');

/**  Set Include path to point at the PHPExcel Classes folder  **/
set_include_path(get_include_path() . PATH_SEPARATOR . '../../../Classes/');

/**  Include PHPExcel_IOFactory  **/
include 'PHPExcel/IOFactory.php';

$inputFileName = 'Test.xlsx';


/**  Define a Read Filter class implementing PHPExcel_Reader_IReadFilter  */
class chunkReadFilter implements PHPExcel_Reader_IReadFilter
{
    private $_startRow = 0;

    private $_endRow = 0;

    /**  Set the list of rows that we want to read  */
    public function setRows($startRow, $chunkSize) {
        $this->_startRow    = $startRow;
        $this->_endRow      = $startRow + $chunkSize;
    }

    public function readCell($column, $row, $worksheetName = '') {
        //  Only read the heading row, and the rows that are configured in $this->_startRow and $this->_endRow
        if (($row == 1) || ($row >= $this->_startRow && $row < $this->_endRow)) {
            return true;
        }
        return false;
    }
}

$inputFileType = PHPExcel_IOFactory::identify($inputFileName);

$callStartTime = microtime(true);

echo 'Loading file ',pathinfo($inputFileName,PATHINFO_BASENAME),' using IOFactory with a defined reader type of ',$inputFileType,'<br />';

// Call time

$callEndTime = microtime(true);
$callTime = $callEndTime - $callStartTime;
echo 'Call time to read Workbook was ' , sprintf('%.4f',$callTime) , " seconds" , EOL;

// Echo memory usage
echo date('H:i:s') , ' Current memory usage: ' , (memory_get_usage(true) / 1024 / 1024) , " MB" , EOL;

/**  Create a new Reader of the type defined in $inputFileType  **/
$objReader = PHPExcel_IOFactory::createReader($inputFileType);

echo '<hr />';


/**  Define how many rows we want to read for each "chunk"  **/
$chunkSize = 100;
/**  Create a new Instance of our Read Filter  **/
$chunkFilter = new chunkReadFilter();

/**  Tell the Reader that we want to use the Read Filter that we've Instantiated  **/
$objReader->setReadFilter($chunkFilter);
/**  Loop to read our worksheet in "chunk size" blocks  **/
for ($startRow = 2; $startRow <= 26000; $startRow += $chunkSize) {
    echo 'Loading WorkSheet using configurable filter for headings row 1 and for rows ',$startRow,' to ',($startRow+$chunkSize-1),'<br />';
    /**  Tell the Read Filter, the limits on which rows we want to read this iteration  **/
    $chunkFilter->setRows($startRow,$chunkSize);
    /**  Load only the rows that match our filter from $inputFileName to a PHPExcel Object  **/
    $objPHPExcel = $objReader->load($inputFileName);

    //  Do some processing here

    $sheetData = $objPHPExcel->getActiveSheet()->toArray(null,true,true,true);

    echo '<br /><br />';

    // Call time

    $callEndTime = microtime(true);
    $callTime = $callEndTime - $callStartTime;
    echo 'Call time to read Workbook was ' , sprintf('%.4f',$callTime) , " seconds" , EOL;

    // Echo memory usage
    echo date('H:i:s') , ' Current memory usage: ' , (memory_get_usage(true) / 1024 / 1024) , " MB" , EOL;

    // Echo memory peak usage
    echo date('H:i:s') , " Peak memory usage: " , (memory_get_peak_usage(true) / 1024 / 1024) , " MB" , EOL;
    echo '<hr />';
}

?>
<body>
</html>

你这里确实有一个大问题:

$sheetData = $objPHPExcel->getActiveSheet()->toArray(null,true,true,true);

这将尝试为工作表的整个大小构建一个数组,无论您是否加载块.... toArray() 根据您正在加载的文件使用电子表格的预期大小,而不是您正在加载的筛选单元格集

尝试使用 rangeToArray() 仅获取您通过块加载的单元格范围

$sheetData = $objPHPExcel->getActiveSheet()
    ->rangeToArray(
        'A'.$startRow.':'.$objPHPExcel->getActiveSheet()->getHighestColumn().($startRow+$chunkSize-1),
        null,
        true,
        true,
        true
    );

即便如此,在内存中构建 PHP 数组也会占用大量内存;如果你的代码可以一次处理一行工作表数据而不是填充一个大数组,那么它对内存的需求就会少很多