PHP :: 解析字符串,同时遍历子字符串数组?

PHP :: Parse strings, while iterating through an array of substrings?

我是一名 Java 开发人员,正在努力编写他的第一个 PHP 脚本。仅供参考,我在 Ubuntu 机器上用 PHP 8.1.2 编码。

我的代码必须打开一个日志文件,逐行读取,然后根据字符串的前导码提取关键子字符串。例如,如果日志文件是:

April 01 2020 Key Information Read :: Interesting Character #1:  Kermit the Frog
April 01 2020 Key Information Read :: Interesting Character #2:  Miss Piggy
April 01 2020 Key Information Read :: Their Best Movie:  The Muppet Movie (1979)
...many more lines...

然后我需要一个读取每一行并提取的脚本:

Kermit the Frog
Miss Piggy
The Muppet Movie (1979)
...many more items...

在读取文件之前我不会知道上面的那些值。

这个问题很好解决。这是我的 PHP 代码,其中 $str 是输入文件的一行:

    function parseThisStr($str){
            if( str_contains($str, "Interesting Character #1:  ") ){
                    $mySubstr = "Interesting Character #1:  ";
                    $tmpIndex = strpos( $str, $mySubstr );
                    $tmpIndex += strlen($mySubstr);
                    $str2 = substr( $str, $tmpIndex );
                    $str2 = preg_replace('~[\r\n]+~', '', $str2);   // remove newline
                    return $str2;
            }
            else if( str_contains($str, "Interesting Character #2:  ") ){
                    $mySubstr = "Interesting Character #2:  ";
                    ...copy code from above...
                    return $str2;
            else if( str_contains($str, "Their Best Movie:  ") ){
                    $mySubstr = "Their Best Movie:  ";
                    ...copy code from above...
                    return $str2;
            return $str;
    }

这会起作用...但它是不必要的重复,对吧?对于我检查的每个子字符串,我需要复制五行相同的代码。我需要搜索大约 30 个子字符串;这将使我的代码比需要的长大约 150 行。

一定有办法用更多的智慧来做到这一点,对吧?我不能将每个要搜索的子字符串存储在一个数组中吗,可能是这样的:

$array = array(
    1    => "Interesting Character #1:  ",
    2    => "Interesting Character #2:  ",
    3    => "Their Best Movie:  ",
    ...etc...
);

...然后遍历数组,可能是这样的:

    function parseThisStr($str){
            $array = array(
                  1    => "Kermit the Frog",
                  ...etc...
            };
            foreach( $array as &$value ){
                if( str_contains($str, $value) ){
                        $tmpIndex = strpos( $str, $value );
                        $tmpIndex += strlen($value);
                        $str2 = substr( $str, $tmpIndex );
                        $str2 = preg_replace('~[\r\n]+~', '', $str2);   // remove newline
                        return $str2;
                }
            return null;
            }

从概念上讲,这应该可行...但我想不出正确的语法。遗憾的是,PHP 语法让我感到困惑。有没有人看到我要去哪里错了?谢谢。

编辑:我在第一篇文章中搞砸了 $array 的值。 $array 应该有我将用来搜索较大字符串的子字符串。

使用正则表达式会产生更清晰的代码。例如 preg_match:

$line = 'April 01 2020 Key Information Read :: Interesting Character #1:  Kermit the Frog';
$searchTerms = ["Kermit the Frog","Miss Piggy","The Muppet Movie (1979)"];

// prepare regex with named group from terms
$delimiter = '~';
$regex = $delimiter . '(?<phrase>(' . join('|', array_map(fn($term) => preg_quote($term, $delimiter), $searchTerms)) . '))' . $delimite;

// search by regex
preg_match($regex, $line, $matches);
$foundPhrase = $matches['phrase'] ?? null;

您可以使用具有更具体模式的正则表达式:

\b(?:Interesting Character #\d+:|Their Best Movie:)\h+\K.+

模式匹配:

  • \b 防止部分单词匹配的单词边界
  • (?:Interesting Character #\d+:|Their Best Movie:)
  • \h+匹配1+个水平空白字符
  • \K忘记目前匹配的是什么
  • .+匹配1个或多个字符

看到一个regex demo and a PHP demo

$re = '/\b(?:Interesting Character #\d+:|Their Best Movie:)\h+\K.+/';
$str = 'April 01 2020 Key Information Read :: Interesting Character #1:  Kermit the Frog
April 01 2020 Key Information Read :: Interesting Character #2:  Miss Piggy
April 01 2020 Key Information Read :: Their Best Movie:  The Muppet Movie (1979)
';

preg_match_all($re, $str, $matches);
print_r($matches[0]);

输出

Array
(
    [0] => Kermit the Frog
    [1] => Miss Piggy
    [2] => The Muppet Movie (1979)
)

另一种模式,具有更广泛的匹配,考虑到前导 :: 并匹配直到第一次出现 :

::\h+[^:\r\n]+:\h+\K.+

再看一个regex demo