PHP :: 解析字符串,同时遍历子字符串数组?
PHP :: Parse strings, while iterating through an array of substrings?
我是一名 Java 开发人员,正在努力编写他的第一个 PHP 脚本。仅供参考,我在 Ubuntu 机器上用 PHP 8.1.2 编码。
我的代码必须打开一个日志文件,逐行读取,然后根据字符串的前导码提取关键子字符串。例如,如果日志文件是:
April 01 2020 Key Information Read :: Interesting Character #1: Kermit the Frog
April 01 2020 Key Information Read :: Interesting Character #2: Miss Piggy
April 01 2020 Key Information Read :: Their Best Movie: The Muppet Movie (1979)
...many more lines...
然后我需要一个读取每一行并提取的脚本:
Kermit the Frog
Miss Piggy
The Muppet Movie (1979)
...many more items...
在读取文件之前我不会知道上面的那些值。
这个问题很好解决。这是我的 PHP 代码,其中 $str 是输入文件的一行:
function parseThisStr($str){
if( str_contains($str, "Interesting Character #1: ") ){
$mySubstr = "Interesting Character #1: ";
$tmpIndex = strpos( $str, $mySubstr );
$tmpIndex += strlen($mySubstr);
$str2 = substr( $str, $tmpIndex );
$str2 = preg_replace('~[\r\n]+~', '', $str2); // remove newline
return $str2;
}
else if( str_contains($str, "Interesting Character #2: ") ){
$mySubstr = "Interesting Character #2: ";
...copy code from above...
return $str2;
else if( str_contains($str, "Their Best Movie: ") ){
$mySubstr = "Their Best Movie: ";
...copy code from above...
return $str2;
return $str;
}
这会起作用...但它是不必要的重复,对吧?对于我检查的每个子字符串,我需要复制五行相同的代码。我需要搜索大约 30 个子字符串;这将使我的代码比需要的长大约 150 行。
一定有办法用更多的智慧来做到这一点,对吧?我不能将每个要搜索的子字符串存储在一个数组中吗,可能是这样的:
$array = array(
1 => "Interesting Character #1: ",
2 => "Interesting Character #2: ",
3 => "Their Best Movie: ",
...etc...
);
...然后遍历数组,可能是这样的:
function parseThisStr($str){
$array = array(
1 => "Kermit the Frog",
...etc...
};
foreach( $array as &$value ){
if( str_contains($str, $value) ){
$tmpIndex = strpos( $str, $value );
$tmpIndex += strlen($value);
$str2 = substr( $str, $tmpIndex );
$str2 = preg_replace('~[\r\n]+~', '', $str2); // remove newline
return $str2;
}
return null;
}
从概念上讲,这应该可行...但我想不出正确的语法。遗憾的是,PHP 语法让我感到困惑。有没有人看到我要去哪里错了?谢谢。
编辑:我在第一篇文章中搞砸了 $array
的值。 $array
应该有我将用来搜索较大字符串的子字符串。
使用正则表达式会产生更清晰的代码。例如 preg_match
:
$line = 'April 01 2020 Key Information Read :: Interesting Character #1: Kermit the Frog';
$searchTerms = ["Kermit the Frog","Miss Piggy","The Muppet Movie (1979)"];
// prepare regex with named group from terms
$delimiter = '~';
$regex = $delimiter . '(?<phrase>(' . join('|', array_map(fn($term) => preg_quote($term, $delimiter), $searchTerms)) . '))' . $delimite;
// search by regex
preg_match($regex, $line, $matches);
$foundPhrase = $matches['phrase'] ?? null;
您可以使用具有更具体模式的正则表达式:
\b(?:Interesting Character #\d+:|Their Best Movie:)\h+\K.+
模式匹配:
\b
防止部分单词匹配的单词边界
(?:Interesting Character #\d+:|Their Best Movie:)
\h+
匹配1+个水平空白字符
\K
忘记目前匹配的是什么
.+
匹配1个或多个字符
看到一个regex demo and a PHP demo
$re = '/\b(?:Interesting Character #\d+:|Their Best Movie:)\h+\K.+/';
$str = 'April 01 2020 Key Information Read :: Interesting Character #1: Kermit the Frog
April 01 2020 Key Information Read :: Interesting Character #2: Miss Piggy
April 01 2020 Key Information Read :: Their Best Movie: The Muppet Movie (1979)
';
preg_match_all($re, $str, $matches);
print_r($matches[0]);
输出
Array
(
[0] => Kermit the Frog
[1] => Miss Piggy
[2] => The Muppet Movie (1979)
)
另一种模式,具有更广泛的匹配,考虑到前导 ::
并匹配直到第一次出现 :
::\h+[^:\r\n]+:\h+\K.+
再看一个regex demo
我是一名 Java 开发人员,正在努力编写他的第一个 PHP 脚本。仅供参考,我在 Ubuntu 机器上用 PHP 8.1.2 编码。
我的代码必须打开一个日志文件,逐行读取,然后根据字符串的前导码提取关键子字符串。例如,如果日志文件是:
April 01 2020 Key Information Read :: Interesting Character #1: Kermit the Frog
April 01 2020 Key Information Read :: Interesting Character #2: Miss Piggy
April 01 2020 Key Information Read :: Their Best Movie: The Muppet Movie (1979)
...many more lines...
然后我需要一个读取每一行并提取的脚本:
Kermit the Frog
Miss Piggy
The Muppet Movie (1979)
...many more items...
在读取文件之前我不会知道上面的那些值。
这个问题很好解决。这是我的 PHP 代码,其中 $str 是输入文件的一行:
function parseThisStr($str){
if( str_contains($str, "Interesting Character #1: ") ){
$mySubstr = "Interesting Character #1: ";
$tmpIndex = strpos( $str, $mySubstr );
$tmpIndex += strlen($mySubstr);
$str2 = substr( $str, $tmpIndex );
$str2 = preg_replace('~[\r\n]+~', '', $str2); // remove newline
return $str2;
}
else if( str_contains($str, "Interesting Character #2: ") ){
$mySubstr = "Interesting Character #2: ";
...copy code from above...
return $str2;
else if( str_contains($str, "Their Best Movie: ") ){
$mySubstr = "Their Best Movie: ";
...copy code from above...
return $str2;
return $str;
}
这会起作用...但它是不必要的重复,对吧?对于我检查的每个子字符串,我需要复制五行相同的代码。我需要搜索大约 30 个子字符串;这将使我的代码比需要的长大约 150 行。
一定有办法用更多的智慧来做到这一点,对吧?我不能将每个要搜索的子字符串存储在一个数组中吗,可能是这样的:
$array = array(
1 => "Interesting Character #1: ",
2 => "Interesting Character #2: ",
3 => "Their Best Movie: ",
...etc...
);
...然后遍历数组,可能是这样的:
function parseThisStr($str){
$array = array(
1 => "Kermit the Frog",
...etc...
};
foreach( $array as &$value ){
if( str_contains($str, $value) ){
$tmpIndex = strpos( $str, $value );
$tmpIndex += strlen($value);
$str2 = substr( $str, $tmpIndex );
$str2 = preg_replace('~[\r\n]+~', '', $str2); // remove newline
return $str2;
}
return null;
}
从概念上讲,这应该可行...但我想不出正确的语法。遗憾的是,PHP 语法让我感到困惑。有没有人看到我要去哪里错了?谢谢。
编辑:我在第一篇文章中搞砸了 $array
的值。 $array
应该有我将用来搜索较大字符串的子字符串。
使用正则表达式会产生更清晰的代码。例如 preg_match
:
$line = 'April 01 2020 Key Information Read :: Interesting Character #1: Kermit the Frog';
$searchTerms = ["Kermit the Frog","Miss Piggy","The Muppet Movie (1979)"];
// prepare regex with named group from terms
$delimiter = '~';
$regex = $delimiter . '(?<phrase>(' . join('|', array_map(fn($term) => preg_quote($term, $delimiter), $searchTerms)) . '))' . $delimite;
// search by regex
preg_match($regex, $line, $matches);
$foundPhrase = $matches['phrase'] ?? null;
您可以使用具有更具体模式的正则表达式:
\b(?:Interesting Character #\d+:|Their Best Movie:)\h+\K.+
模式匹配:
\b
防止部分单词匹配的单词边界(?:Interesting Character #\d+:|Their Best Movie:)
\h+
匹配1+个水平空白字符\K
忘记目前匹配的是什么.+
匹配1个或多个字符
看到一个regex demo and a PHP demo
$re = '/\b(?:Interesting Character #\d+:|Their Best Movie:)\h+\K.+/';
$str = 'April 01 2020 Key Information Read :: Interesting Character #1: Kermit the Frog
April 01 2020 Key Information Read :: Interesting Character #2: Miss Piggy
April 01 2020 Key Information Read :: Their Best Movie: The Muppet Movie (1979)
';
preg_match_all($re, $str, $matches);
print_r($matches[0]);
输出
Array
(
[0] => Kermit the Frog
[1] => Miss Piggy
[2] => The Muppet Movie (1979)
)
另一种模式,具有更广泛的匹配,考虑到前导 ::
并匹配直到第一次出现 :
::\h+[^:\r\n]+:\h+\K.+
再看一个regex demo