preg_split 在正则表达式行开始
preg_split on regex line start
我正在尝试格式化以下文件;
[30-05-2013 15:45:54] A A
[26-06-2013 14:44:44] B A
[26-06-2013 14:44:44] C A
[26-06-2013 14:43:16] Some lines are so large, they take multiple lines, so explode('\n') won't work because
I need the complete message
[26-06-2013 14:44:44] E A
[26-06-2013 14:44:44] F A
[26-06-2013 14:44:44] G A
预期输出:
Array
(
[0] => [30-05-2013 15:45:54] A A
[1] => [26-06-2013 14:44:44] B A
[2] => [26-06-2013 14:44:44] C A
[3] => [26-06-2013 14:43:16] Some lines are so large, they take multiple lines, so
explode('\n') won't work because
I need the complete message
[4] => [26-06-2013 14:44:44] E A
...
)
基于 How do I include the split delimiter in results for preg_split()? I tried to use a positive lookbehind to persist the timestamps and came up with Regex101:
(?<=\[)(.+)(?<=\])(.+)
在下面的PHP代码中用到了;
#!/usr/bin/env php
<?php
class Chat {
function __construct() {
// Read chat file
$this->f = file_get_contents(__DIR__ . '/testchat.txt');
// Split on '[\d]'
$r = "/(?<=\[)(.+)(?<=\])(.+)/";
$l = preg_split($r, $this->f, -1, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE);
var_dump(count($l));
var_dump($l);
}
}
$c = new Chat();
这给了我以下输出;
array(22) {
[0]=>
string(1) "["
[1]=>
string(20) "30-05-2013 15:45:54]"
[2]=>
string(4) " A A"
[3]=>
string(2) "
["
[4]=>
string(20) "26-06-2013 14:44:44]"
[5]=>
string(4) " B A"
[6]=>
string(2) "
["
[7]=>
string(20) "26-06-2013 14:44:44]"
[8]=>
string(4) " C A"
[9]=>
string(2) "
["
[10]=>
string(20) "26-06-2013 14:43:16]"
[11]=>
string(87) " Some lines are so large, they take multiple lines, so explode('\n') won't work because"
[12]=>
string(30) "
I need the complete message
["
问题
- 为什么第一个
[
被忽略了?
- 我应该如何更改正则表达式以获得所需的输出?
- 为什么还有
PREG_SPLIT_NO_EMPTY
的空字符串?
使用preg_split
,您可以使用
'~\R+(?=\[\d{2}-\d{2}-\d{4} \d{2}:\d{2}:\d{2}])~'
详情
\R+
- 1+ 个换行字符
(?=\[\d{2}-\d{2}-\d{4} \d{2}:\d{2}:\d{2}])
- 一个积极的前瞻,紧接在当前位置的右侧,需要
\[
- 一个 [
字符
\d{2}-\d{2}-\d{4}
- 类似日期的模式,2 位数字,连字符,2 位数字,连字符和 2 位数字
- 一个 space
\d{2}:\d{2}:\d{2}]
- 类似时间的模式,2 位数字,:
,2 位数字,:
,2 位数字。
PHP 演示:
$text = "[30-05-2013 15:45:54] A A
[26-06-2013 14:44:44] B A
[26-06-2013 14:44:44] C A
[26-06-2013 14:43:16] Some lines are so large, they take multiple lines, so explode('\n') won't work because
I need the complete message
[26-06-2013 14:44:44] E A
[26-06-2013 14:44:44] F A
[26-06-2013 14:44:44] G A";
print_r(preg_split('~\R+(?=\[\d{2}-\d{2}-\d{4} \d{2}:\d{2}:\d{2}])~', $text));
输出:
Array
(
[0] => [30-05-2013 15:45:54] A A
[1] => [26-06-2013 14:44:44] B A
[2] => [26-06-2013 14:44:44] C A
[3] => [26-06-2013 14:43:16] Some lines are so large, they take multiple lines, so explode('
') won't work because
I need the complete message
[4] => [26-06-2013 14:44:44] E A
[5] => [26-06-2013 14:44:44] F A
[6] => [26-06-2013 14:44:44] G A
)
以防万一您需要获得更多详细信息而不仅仅是拆分,您可以使用 匹配 方法
'~^\[(\d{2}-\d{2}-\d{4} \d{2}:\d{2}:\d{2})]\s*+(.*?)(?=\s*^\[(?1)]|\z)~ms'
参见the regex demo,用作
preg_match_all('~^\[(\d{2}-\d{2}-\d{4} \d{2}:\d{2}:\d{2})]\s*+(.*?)(?=\s*^\[(?1)]|\z)~ms', $text, $matches)
它将匹配
^
- 行首
\[(\d{2}-\d{2}-\d{4} \d{2}:\d{2}:\d{2})]
- 日期时间详细信息(捕获到第 1 组)
\s*+
- 0+白spaces(占有)
(.*?)
- 在第一次出现 之前尽可能少的任何 0+ 个字符
(?=\s*^\[(?1)]|\z)
- 前瞻匹配紧随其后的位置
\s*
- 0+白spaces
^
- 行首
\[(?1)]
- [
,第 1 组模式,]
|
- 或
\z
- 字符串的末尾。
迟到的答案,但您也可以使用:
$text = file_get_contents("testchat.txt");
preg_match_all('/(\[.*?\])([^\[]+)/im', $text, $matches, PREG_PATTERN_ORDER);
for ($i = 0; $i < count($matches[0]); $i++) {
$date = $matches[1][$i];
$line = $matches[2][$i];
print("$date $line");
}
我正在尝试格式化以下文件;
[30-05-2013 15:45:54] A A
[26-06-2013 14:44:44] B A
[26-06-2013 14:44:44] C A
[26-06-2013 14:43:16] Some lines are so large, they take multiple lines, so explode('\n') won't work because
I need the complete message
[26-06-2013 14:44:44] E A
[26-06-2013 14:44:44] F A
[26-06-2013 14:44:44] G A
预期输出:
Array
(
[0] => [30-05-2013 15:45:54] A A
[1] => [26-06-2013 14:44:44] B A
[2] => [26-06-2013 14:44:44] C A
[3] => [26-06-2013 14:43:16] Some lines are so large, they take multiple lines, so
explode('\n') won't work because
I need the complete message
[4] => [26-06-2013 14:44:44] E A
...
)
基于 How do I include the split delimiter in results for preg_split()? I tried to use a positive lookbehind to persist the timestamps and came up with Regex101:
(?<=\[)(.+)(?<=\])(.+)
在下面的PHP代码中用到了;
#!/usr/bin/env php
<?php
class Chat {
function __construct() {
// Read chat file
$this->f = file_get_contents(__DIR__ . '/testchat.txt');
// Split on '[\d]'
$r = "/(?<=\[)(.+)(?<=\])(.+)/";
$l = preg_split($r, $this->f, -1, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE);
var_dump(count($l));
var_dump($l);
}
}
$c = new Chat();
这给了我以下输出;
array(22) {
[0]=>
string(1) "["
[1]=>
string(20) "30-05-2013 15:45:54]"
[2]=>
string(4) " A A"
[3]=>
string(2) "
["
[4]=>
string(20) "26-06-2013 14:44:44]"
[5]=>
string(4) " B A"
[6]=>
string(2) "
["
[7]=>
string(20) "26-06-2013 14:44:44]"
[8]=>
string(4) " C A"
[9]=>
string(2) "
["
[10]=>
string(20) "26-06-2013 14:43:16]"
[11]=>
string(87) " Some lines are so large, they take multiple lines, so explode('\n') won't work because"
[12]=>
string(30) "
I need the complete message
["
问题
- 为什么第一个
[
被忽略了? - 我应该如何更改正则表达式以获得所需的输出?
- 为什么还有
PREG_SPLIT_NO_EMPTY
的空字符串?
使用preg_split
,您可以使用
'~\R+(?=\[\d{2}-\d{2}-\d{4} \d{2}:\d{2}:\d{2}])~'
详情
\R+
- 1+ 个换行字符(?=\[\d{2}-\d{2}-\d{4} \d{2}:\d{2}:\d{2}])
- 一个积极的前瞻,紧接在当前位置的右侧,需要\[
- 一个[
字符\d{2}-\d{2}-\d{4}
- 类似日期的模式,2 位数字,连字符,2 位数字,连字符和 2 位数字\d{2}:\d{2}:\d{2}]
- 类似时间的模式,2 位数字,:
,2 位数字,:
,2 位数字。
PHP 演示:
$text = "[30-05-2013 15:45:54] A A
[26-06-2013 14:44:44] B A
[26-06-2013 14:44:44] C A
[26-06-2013 14:43:16] Some lines are so large, they take multiple lines, so explode('\n') won't work because
I need the complete message
[26-06-2013 14:44:44] E A
[26-06-2013 14:44:44] F A
[26-06-2013 14:44:44] G A";
print_r(preg_split('~\R+(?=\[\d{2}-\d{2}-\d{4} \d{2}:\d{2}:\d{2}])~', $text));
输出:
Array
(
[0] => [30-05-2013 15:45:54] A A
[1] => [26-06-2013 14:44:44] B A
[2] => [26-06-2013 14:44:44] C A
[3] => [26-06-2013 14:43:16] Some lines are so large, they take multiple lines, so explode('
') won't work because
I need the complete message
[4] => [26-06-2013 14:44:44] E A
[5] => [26-06-2013 14:44:44] F A
[6] => [26-06-2013 14:44:44] G A
)
以防万一您需要获得更多详细信息而不仅仅是拆分,您可以使用 匹配 方法
'~^\[(\d{2}-\d{2}-\d{4} \d{2}:\d{2}:\d{2})]\s*+(.*?)(?=\s*^\[(?1)]|\z)~ms'
参见the regex demo,用作
preg_match_all('~^\[(\d{2}-\d{2}-\d{4} \d{2}:\d{2}:\d{2})]\s*+(.*?)(?=\s*^\[(?1)]|\z)~ms', $text, $matches)
它将匹配
^
- 行首\[(\d{2}-\d{2}-\d{4} \d{2}:\d{2}:\d{2})]
- 日期时间详细信息(捕获到第 1 组)\s*+
- 0+白spaces(占有)(.*?)
- 在第一次出现 之前尽可能少的任何 0+ 个字符
(?=\s*^\[(?1)]|\z)
- 前瞻匹配紧随其后的位置\s*
- 0+白spaces^
- 行首\[(?1)]
-[
,第 1 组模式,]
|
- 或\z
- 字符串的末尾。
迟到的答案,但您也可以使用:
$text = file_get_contents("testchat.txt");
preg_match_all('/(\[.*?\])([^\[]+)/im', $text, $matches, PREG_PATTERN_ORDER);
for ($i = 0; $i < count($matches[0]); $i++) {
$date = $matches[1][$i];
$line = $matches[2][$i];
print("$date $line");
}