PHP 使用 fseek 获取多个特定行
PHP get multiple specific line using fseek
我试图只使用 fopen()
和 fseek()
来获取特定的代码行(不仅是一行,我需要获取当前搜索行上方和下方的行)。
为了提高性能,我知道如何获取特定的行来查找然后退出。如果我需要第 5 行,那么应该可以在第 4 行和第 6 行中找到。
这是一个代码,用于获取每行的字节,然后将行作为键放入数组,将值作为字节放入数组 EOF
。
$fh = fopen($source, 'r');
$meta = stream_get_meta_data($fh);
if (!$meta['seekable']) {
throw new Exception(sprintf("A source is not seekable: %s", print_r($source, true)));
}
$line = fgets($fh, 4096);
$pos = -1;
$i = 0;
$result = null;
$linenum = 10;
var_dump('Line num:'.$linenum);
$total_lines = null;
// Get seek byte end of each line
while (!feof($fh)) {
$char = fgetc($fh);
if ($char != "\n" && $char != "\r") {
$total_lines[$i] = $pos;
$pos++;
} else {
$i++;
}
//var_dump(fgets($fh).' _ '.$pos);
}
// Now get specific lines (line 5, line 6 and line 7)
$seekssearch = array($total_lines[5], $total_lines[6], $total_lines[7]);
$result = null;
$posr = 0;
foreach ($seekssearch as $sk) {
while (!feof($fh)) {
if ($char != "\n" && $char != "\r") {
fseek($fh, $sk, SEEK_SET);
$posr++;
} else {
$ir++;
}
}
// Merge result of line 5,6 and 7
$result .= fgets($fh);
}
echo $result;
exit;
while (!feof($fh) && $i<($linenum)) {
$char = fgetc($fh);
if ($char != "\n" && $char != "\r") {
fseek($fh, $pos, SEEK_SET);
$pos++;
}
else {
$i++;
}
}
$line = trim(fgets($fh));
var_dump($line);
exit;
exit;
while (!feof($fh) && $i<($linenum-1)) {
$char = fgetc($fh);
if ($char != "\n" && $char != "\r") {
//fseek($fh, $pos);
fseek($fh, $pos);
$pos++;
}
else {
if ($pos == 3) {
$line = fgets($fh);
}
$i++;
}
}
//$line = fgets($fh);
var_dump($line); exit;
如何合并这些行?
Note: I don't want using splFileInfo
or any tricks like arrays. Just want to seek then exit.
我创建了一个函数来读取文件并计算行数并将每一行字节存储到数组中以查找。如果设置了 linenum
指定的最大值,它将中断 while 以保持性能,而不是在新的循环函数中寻找字节位置以获取文件内容。
我相信这个功能可以改进。
function readFileSeek($source, $linenum = 0, $range = 0)
{
$fh = fopen($source, 'r');
$meta = stream_get_meta_data($fh);
if (!$meta['seekable']) {
throw new Exception(sprintf("A source is not seekable: %s", print_r($source, true)));
}
$pos = 2;
$result = null;
if ($linenum) {
$minline = $linenum - $range - 1;
$maxline = $minline+$range+$range;
}
$totalLines = 0;
while (!feof($fh)) {
$char = fgetc($fh);
if ($char == "\n" || $char == "\r") {
++$totalLines;
} else {
$result[$totalLines] = $pos;
}
$pos++;
if ($maxline+1 == $totalLines) {
// break from while to not read entire file
break;
}
}
$buffer = '';
for ($nr=$minline; $nr<=$maxline; $nr++) {
if (isset($result[$nr])) {
fseek($fh, $result[$nr], SEEK_SET);
while (!feof($fh)) {
$char = fgetc($fh);
if ($char == "\n" || $char == "\r") {
$buffer .= $char;
break;
} else {
$buffer .= $char;
}
}
}
}
return $buffer;
}
测试结果(1.3GB文件,100000000行代码,求300000行代码):
string(55) "299998_abc
299999_abc
300000_abc
300001_abc
300002_abc
"
Time: 612 ms, Memory: 20.00Mb
$ ll -h /tmp/testReadSourceLines_27151460344/41340913936
-rw-rw-r-- 1 1,3G /tmp/testReadSourceLines_27151460344/41340913936
我试图只使用 fopen()
和 fseek()
来获取特定的代码行(不仅是一行,我需要获取当前搜索行上方和下方的行)。
为了提高性能,我知道如何获取特定的行来查找然后退出。如果我需要第 5 行,那么应该可以在第 4 行和第 6 行中找到。
这是一个代码,用于获取每行的字节,然后将行作为键放入数组,将值作为字节放入数组 EOF
。
$fh = fopen($source, 'r');
$meta = stream_get_meta_data($fh);
if (!$meta['seekable']) {
throw new Exception(sprintf("A source is not seekable: %s", print_r($source, true)));
}
$line = fgets($fh, 4096);
$pos = -1;
$i = 0;
$result = null;
$linenum = 10;
var_dump('Line num:'.$linenum);
$total_lines = null;
// Get seek byte end of each line
while (!feof($fh)) {
$char = fgetc($fh);
if ($char != "\n" && $char != "\r") {
$total_lines[$i] = $pos;
$pos++;
} else {
$i++;
}
//var_dump(fgets($fh).' _ '.$pos);
}
// Now get specific lines (line 5, line 6 and line 7)
$seekssearch = array($total_lines[5], $total_lines[6], $total_lines[7]);
$result = null;
$posr = 0;
foreach ($seekssearch as $sk) {
while (!feof($fh)) {
if ($char != "\n" && $char != "\r") {
fseek($fh, $sk, SEEK_SET);
$posr++;
} else {
$ir++;
}
}
// Merge result of line 5,6 and 7
$result .= fgets($fh);
}
echo $result;
exit;
while (!feof($fh) && $i<($linenum)) {
$char = fgetc($fh);
if ($char != "\n" && $char != "\r") {
fseek($fh, $pos, SEEK_SET);
$pos++;
}
else {
$i++;
}
}
$line = trim(fgets($fh));
var_dump($line);
exit;
exit;
while (!feof($fh) && $i<($linenum-1)) {
$char = fgetc($fh);
if ($char != "\n" && $char != "\r") {
//fseek($fh, $pos);
fseek($fh, $pos);
$pos++;
}
else {
if ($pos == 3) {
$line = fgets($fh);
}
$i++;
}
}
//$line = fgets($fh);
var_dump($line); exit;
如何合并这些行?
Note: I don't want using
splFileInfo
or any tricks like arrays. Just want to seek then exit.
我创建了一个函数来读取文件并计算行数并将每一行字节存储到数组中以查找。如果设置了 linenum
指定的最大值,它将中断 while 以保持性能,而不是在新的循环函数中寻找字节位置以获取文件内容。
我相信这个功能可以改进。
function readFileSeek($source, $linenum = 0, $range = 0)
{
$fh = fopen($source, 'r');
$meta = stream_get_meta_data($fh);
if (!$meta['seekable']) {
throw new Exception(sprintf("A source is not seekable: %s", print_r($source, true)));
}
$pos = 2;
$result = null;
if ($linenum) {
$minline = $linenum - $range - 1;
$maxline = $minline+$range+$range;
}
$totalLines = 0;
while (!feof($fh)) {
$char = fgetc($fh);
if ($char == "\n" || $char == "\r") {
++$totalLines;
} else {
$result[$totalLines] = $pos;
}
$pos++;
if ($maxline+1 == $totalLines) {
// break from while to not read entire file
break;
}
}
$buffer = '';
for ($nr=$minline; $nr<=$maxline; $nr++) {
if (isset($result[$nr])) {
fseek($fh, $result[$nr], SEEK_SET);
while (!feof($fh)) {
$char = fgetc($fh);
if ($char == "\n" || $char == "\r") {
$buffer .= $char;
break;
} else {
$buffer .= $char;
}
}
}
}
return $buffer;
}
测试结果(1.3GB文件,100000000行代码,求300000行代码):
string(55) "299998_abc
299999_abc
300000_abc
300001_abc
300002_abc
"
Time: 612 ms, Memory: 20.00Mb
$ ll -h /tmp/testReadSourceLines_27151460344/41340913936
-rw-rw-r-- 1 1,3G /tmp/testReadSourceLines_27151460344/41340913936