是否可以使用 PHPExcel 库导入和导出大小为 70MB 的 excel 文件?
is it possible to import and export excel file with size 70MB using PHPExcel library?
我有一个包含 3 列的 excel 文件,其中第 2 列包含电子邮件 hyper-link。所以我必须导入这个文件并导出它只有 2 列,第一列应该包含姓名,第二列电子邮件意味着我必须将 hyper-link 拆分为姓名和电子邮件。
对于 31MB 的文件,我在 php.ini 文件中将内存限制更改为 2048MB 并将执行时间更改为 1200。我可以成功导入和导出 31MB 的 excel 文件,但是导出 70MB 的文件执行需要很长时间并给出以下错误消息。
致命错误:在第 327[=12 行的 /var/www/html/PHPExcel/Reader/Excel2007.php 中,允许的 2147483648 字节内存已用完(已尝试分配 15667514 字节) =]
是否可以使用 PHPExcel 库导入和导出大小为 70MB 的 excel 文件?我必须在 php.ini 文件中更改内存限制和最大执行时间等。
require "PHPExcel.php";
require "PHPExcel/IOFactory.php";
$inputFileName = 'xxx.xlsx';
$inputFileType = PHPExcel_IOFactory::identify($inputFileName);
$objReader = PHPExcel_IOFactory::createReader($inputFileType);
$objReader->setReadDataOnly(true);
$objPHPExcel = $objReader->load($inputFileName);
$outputObj = new PHPExcel();
// Get worksheet dimensions
$sheet = $objPHPExcel->getSheet(0);
$highestRow = $sheet->getHighestRow();
$outputObj->setActiveSheetIndex(0);
$outSheet = $outputObj->getActiveSheet();
// Loop through each row of the worksheet in turn
for ($row = 2; $row <= $highestRow; $row++){ // As row 1 seems to be header
// Read cell B2, B3, etc.
$line = $sheet->getCell('B' . $row)->getValue();
preg_match("|([^\.]+)\ <([^>]+)>|", $line, $data);
if(!empty($data))
{
// $data[1] will be name & $data[2] will be email
$outSheet->setCellValue('A' . $row, $data[1]);
$outSheet->setCellValue('B' . $row, $data[2]);
}
}
$objWriter = new PHPExcel_Writer_CSV($outputObj);
$objWriter->save("xxx.csv");
注意:我可以导出 excel 文件而不对 php.ini 文件进行任何更改吗
我看不出加载一个电子表格文件、将其中的所有内容复制到第二个、然后保存第二个....这将占用大量内存和性能
为什么不只加载第一个,删除标题行 1,然后保存到 CSV 输出
// Read the original spreadsheet
$inputFileName = 'TraiDBDump.xlsx';
$inputFileType = PHPExcel_IOFactory::identify($inputFileName);
$objReader = PHPExcel_IOFactory::createReader($inputFileType);
$objReader->setReadDataOnly(true);
$objPHPExcel = $objReader->load($inputFileName);
// Remove header row
$objPHPExcel->getSheet(0)->removeRow(1, 1);
// Save as a csv file
$objWriter = new PHPExcel_Writer_CSV($objPHPExcel);
$objWriter->save("TraiDBDump.csv");
如果您的原件有很多列,而您只需要 A 和 B,那么您可以使用读取过滤器只读取这两列
@Priyanka,您也可以尝试使用 Spout:https://github.com/box/spout。它适用于大文件!您不必更改 php.ini 文件,因为它不需要超过 10MB 的内存并且应该在默认时间限制之前完成。
你可以这样做:
$filePath = 'xxx.xlsx';
$reader = ReaderFactory::create(Type::XLSX);
$reader->open($filePath);
$writer = WriterFactory::create(Type::CSV);
$writer->openToFile($'xxx.csv');
$rowCount = 0;
while ($reader->hasNextSheet()) {
$reader->nextSheet();
while ($reader->hasNextRow()) {
$row = $reader->nextRow();
$rowCount++;
if ($rowCount === 1) {
continue; // that's for the header row
}
// get the values you need in the current row
// for example:
$name = $row[1];
$email = $row[2];
// write the data to the CSV file
$writer->addRow([$name, $email]);
}
}
$reader->close();
$writer->close();
试试吧!希望它能解决您的问题:)
我找到了解决方案。我在 python 中成功完成了这项任务。希望它会帮助别人。 :)
# Time taken 2min 4sec for 69.9MB file.
import csv
import re
from openpyxl import Workbook, load_workbook
location = 'big.xlsx'
wb = load_workbook(filename=location, read_only=True)
users_data = []
# pattern = '^(.+?) <([^>].+)>$' # matches "your name <email@email.com>"
# pattern_new = '^(.+?)<([^>].+)>$' # matches "your name<email@email.com>"
# pattern_email = '([\w.-]+@[\w.-]+)' # extracts email from sentence
# Define patterns to check on string.
patterns = ['^(.+?) <([^>].+)>$', '^(.+?)<([^>].+)>$']
# Loop through all sheets in XLSX
for wsheet in wb.get_sheet_names():
# Load data from Sheet.
ws = wb.get_sheet_by_name(wsheet)
# Loop through each row in current Sheet.
for row in ws.rows:
# We need column B data, so get that directly.
# Check if its not empty.
if row[1].value:
val = ""
# Get column B data, remove unnecessary data and encode using utf-8 format.
data = row[1].value.replace("(at)", "@").replace("(dot)", ".").encode('utf-8')
# Loop through all patterns to match in current data.
for pattern in patterns:
# Apply regex on data.
name_data = re.search(pattern, data)
# If match found.
if name_data:
# Create list of matched data and break loop to avoid extra searches on current row.
val = [name_data.group(1), name_data.group(2)]
# val = name_data.group()
break
# If no matches found, check for only email, if not then use data as it is.
if not val:
# val = data
name_data = re.search('([\w.-]+@[\w.-]+)', data)
# If match found, then use that, else use data.
if name_data:
val = [name_data.group(1)]
else:
val = data
# Append new data to users_data array.
users_data.append(val)
# Open CSV file for writting list.
myfile = open('big.csv', 'wb')
# Open file in write mode.
wr = csv.writer(myfile, dialect='excel', delimiter = ',', quotechar='"', quoting=csv.QUOTE_MINIMAL, lineterminator='\n')
# Loop through each value in list.
for word in users_data:
# Append data in CSV.
wr.writerow([word])
# Close CSV file.
myfile.close()
我有一个包含 3 列的 excel 文件,其中第 2 列包含电子邮件 hyper-link。所以我必须导入这个文件并导出它只有 2 列,第一列应该包含姓名,第二列电子邮件意味着我必须将 hyper-link 拆分为姓名和电子邮件。
对于 31MB 的文件,我在 php.ini 文件中将内存限制更改为 2048MB 并将执行时间更改为 1200。我可以成功导入和导出 31MB 的 excel 文件,但是导出 70MB 的文件执行需要很长时间并给出以下错误消息。
致命错误:在第 327[=12 行的 /var/www/html/PHPExcel/Reader/Excel2007.php 中,允许的 2147483648 字节内存已用完(已尝试分配 15667514 字节) =]
是否可以使用 PHPExcel 库导入和导出大小为 70MB 的 excel 文件?我必须在 php.ini 文件中更改内存限制和最大执行时间等。
require "PHPExcel.php";
require "PHPExcel/IOFactory.php";
$inputFileName = 'xxx.xlsx';
$inputFileType = PHPExcel_IOFactory::identify($inputFileName);
$objReader = PHPExcel_IOFactory::createReader($inputFileType);
$objReader->setReadDataOnly(true);
$objPHPExcel = $objReader->load($inputFileName);
$outputObj = new PHPExcel();
// Get worksheet dimensions
$sheet = $objPHPExcel->getSheet(0);
$highestRow = $sheet->getHighestRow();
$outputObj->setActiveSheetIndex(0);
$outSheet = $outputObj->getActiveSheet();
// Loop through each row of the worksheet in turn
for ($row = 2; $row <= $highestRow; $row++){ // As row 1 seems to be header
// Read cell B2, B3, etc.
$line = $sheet->getCell('B' . $row)->getValue();
preg_match("|([^\.]+)\ <([^>]+)>|", $line, $data);
if(!empty($data))
{
// $data[1] will be name & $data[2] will be email
$outSheet->setCellValue('A' . $row, $data[1]);
$outSheet->setCellValue('B' . $row, $data[2]);
}
}
$objWriter = new PHPExcel_Writer_CSV($outputObj);
$objWriter->save("xxx.csv");
注意:我可以导出 excel 文件而不对 php.ini 文件进行任何更改吗
我看不出加载一个电子表格文件、将其中的所有内容复制到第二个、然后保存第二个....这将占用大量内存和性能
为什么不只加载第一个,删除标题行 1,然后保存到 CSV 输出
// Read the original spreadsheet
$inputFileName = 'TraiDBDump.xlsx';
$inputFileType = PHPExcel_IOFactory::identify($inputFileName);
$objReader = PHPExcel_IOFactory::createReader($inputFileType);
$objReader->setReadDataOnly(true);
$objPHPExcel = $objReader->load($inputFileName);
// Remove header row
$objPHPExcel->getSheet(0)->removeRow(1, 1);
// Save as a csv file
$objWriter = new PHPExcel_Writer_CSV($objPHPExcel);
$objWriter->save("TraiDBDump.csv");
如果您的原件有很多列,而您只需要 A 和 B,那么您可以使用读取过滤器只读取这两列
@Priyanka,您也可以尝试使用 Spout:https://github.com/box/spout。它适用于大文件!您不必更改 php.ini 文件,因为它不需要超过 10MB 的内存并且应该在默认时间限制之前完成。
你可以这样做:
$filePath = 'xxx.xlsx';
$reader = ReaderFactory::create(Type::XLSX);
$reader->open($filePath);
$writer = WriterFactory::create(Type::CSV);
$writer->openToFile($'xxx.csv');
$rowCount = 0;
while ($reader->hasNextSheet()) {
$reader->nextSheet();
while ($reader->hasNextRow()) {
$row = $reader->nextRow();
$rowCount++;
if ($rowCount === 1) {
continue; // that's for the header row
}
// get the values you need in the current row
// for example:
$name = $row[1];
$email = $row[2];
// write the data to the CSV file
$writer->addRow([$name, $email]);
}
}
$reader->close();
$writer->close();
试试吧!希望它能解决您的问题:)
我找到了解决方案。我在 python 中成功完成了这项任务。希望它会帮助别人。 :)
# Time taken 2min 4sec for 69.9MB file.
import csv
import re
from openpyxl import Workbook, load_workbook
location = 'big.xlsx'
wb = load_workbook(filename=location, read_only=True)
users_data = []
# pattern = '^(.+?) <([^>].+)>$' # matches "your name <email@email.com>"
# pattern_new = '^(.+?)<([^>].+)>$' # matches "your name<email@email.com>"
# pattern_email = '([\w.-]+@[\w.-]+)' # extracts email from sentence
# Define patterns to check on string.
patterns = ['^(.+?) <([^>].+)>$', '^(.+?)<([^>].+)>$']
# Loop through all sheets in XLSX
for wsheet in wb.get_sheet_names():
# Load data from Sheet.
ws = wb.get_sheet_by_name(wsheet)
# Loop through each row in current Sheet.
for row in ws.rows:
# We need column B data, so get that directly.
# Check if its not empty.
if row[1].value:
val = ""
# Get column B data, remove unnecessary data and encode using utf-8 format.
data = row[1].value.replace("(at)", "@").replace("(dot)", ".").encode('utf-8')
# Loop through all patterns to match in current data.
for pattern in patterns:
# Apply regex on data.
name_data = re.search(pattern, data)
# If match found.
if name_data:
# Create list of matched data and break loop to avoid extra searches on current row.
val = [name_data.group(1), name_data.group(2)]
# val = name_data.group()
break
# If no matches found, check for only email, if not then use data as it is.
if not val:
# val = data
name_data = re.search('([\w.-]+@[\w.-]+)', data)
# If match found, then use that, else use data.
if name_data:
val = [name_data.group(1)]
else:
val = data
# Append new data to users_data array.
users_data.append(val)
# Open CSV file for writting list.
myfile = open('big.csv', 'wb')
# Open file in write mode.
wr = csv.writer(myfile, dialect='excel', delimiter = ',', quotechar='"', quoting=csv.QUOTE_MINIMAL, lineterminator='\n')
# Loop through each value in list.
for word in users_data:
# Append data in CSV.
wr.writerow([word])
# Close CSV file.
myfile.close()