PHP

Question

我有以下内容

"aa_bb" : "foo"
"pp_Qq" : "bar"
"Xx_yY_zz" : "foobar"

而且我想把左边的内容转成驼峰式

"aaBb" : "foo"
"ppQq" : "bar"
"xxYyZz" : "foobar"

代码：

// selects the left part
$newString = preg_replace_callback("/\"(.*?)\"(.*?):/", function($matches) {        
    // selects the characters following underscores
    $matches[1] = preg_replace_callback("/_(.?)/", function($matches) {
        //removes the underscore and uppercases the character
        return strtoupper($matches[1]);
    }, $matches[1]);

    // lowercases the first character before returning
    return "\"".lcfirst($matches[1])."\" : ".$matches[2];
}, $string);

这段代码可以简化吗？

注意：内容始终是单个字符串。

Answer 1

您可以将 preg_replace_callback 与 \G 锚点和捕获组结合使用。

(?:"\K([^_\r\n]+)|\G(?!^))(?=[^":\r\n]*")(?=[^:\r\n]*:)_?([a-zA-Z])([^"_\r\n]*)

部分

(?:非捕获组
- "\K([^_\r\n]+) 匹配 "，捕获 组 1 匹配 1+ 次任何字符，除了 _ 或换行符
- | 或
- \G(?!^) 在上一场比赛中声明位置，而不是在开始时
) 关闭群组
(?=[^":\r\n]*") 正面前瞻，断言 "
(?=[^:\r\n]*:) 正面前瞻，断言 :
_?匹配可选_
([a-zA-Z]) 捕获 组 2 匹配 a-zA-Z
([^"_\r\n]*) 捕获 组 3 匹配 0+ 次任何字符，除了 _ 或换行符

在替换中使用 3 个捕获组连接 strtolower and strtoupper 的组合。

Regex demo

例如

$re = '/(?:"\K([^_\r\n]+)|\G(?!^))(?=[^":\r\n]*")(?=[^:\r\n]*:)_?([a-zA-Z])([^"_\r\n]*)/';
$str = '"aa_bb" : "foo"

"pp_Qq" : "bar"

"Xx_yY_zz" : "foobar"
"Xx_yYyyyyyYyY_zz_a" : "foobar"';

$result =  preg_replace_callback($re, function($matches) {
    return strtolower($matches[1]) . strtoupper($matches[2]) . strtolower($matches[3]);
}, $str);

echo $result;

输出

"aaBb" : "foo"

"ppQq" : "bar"

"xxYyZz" : "foobar"
"xxYyyyyyyyyyZzA" : "foobar"

Php demo

Answer 2

首先，由于您已经有了想要改进的工作代码，请考虑 post 您下次在 code review 而不是 Whosebug 中的问题。

让我们开始改进您原来的方法：

$result = preg_replace_callback('~"[^"]*"\s*:~', function ($m) {
    return preg_replace_callback('~_+(.?)~', function ($n) {
        return strtoupper($n[1]);
    }, strtolower($m[0]));
}, $str);

pro:模式比较简单，思路容易理解
缺点： 嵌套 preg_replace_callback 可能会伤害眼睛。

在这个眼睛热身练习之后，我们可以尝试一种基于 \G 的模式方法：

$pattern = '~(?|\G(?!^)_([^_"]*)|("(?=[^"]*"\s*:)[^_"]*))~';
$result = preg_replace_callback($pattern, function ($m) {
    return ucfirst(strtolower($m[1]));
}, $str);

pro:代码更短，不需要使用两个preg_replace_callback。
缺点： 模式要复杂得多。

注意：当你写一个长模式时，没有什么禁止使用带有x修饰符的自由间距模式并放置注释：

$pattern = '~
(?| # branch reset group: in which capture groups have the same number
    \G # contigous to the last successful match
    (?!^) # but not at the start of the string    
    _
    ( [^_"]* ) # capture group 1
  |
    ( # capture group 1
        "
        (?=[^"]*"\s*:) # lookahead to check if it is the "key part"
        [^_"]*
    )
)
~x';

这两个极端之间有没有妥协，什么是好的？两个建议：

$result = preg_replace_callback('~"[^"]+"\s*:~', function ($m) {
    return array_reduce(explode('_', strtolower($m[0])), function ($c, $i) {
        return $c . ucfirst($i);
    });
}, $str);

pro: 最少使用正则表达式。
缺点： 需要两个回调函数，除了这次第二个回调函数是由 array_reduce 而不是 preg_replace_callback 调用的。

$result = preg_replace_callback('~["_][^"_]*(?=[^"]*"\s*:)~', function ($m) {
    return ucfirst(strtolower(ltrim($m[0], '_')));
}, $str);

pro: 模式相对简单，回调函数也保持简单。这看起来是一个很好的妥协。
缺点： 模式不是很严格（但应该足以满足您的用例）

模式描述：模式查找 _ 或 " 并匹配以下不是 _ 或 " 的字符。然后先行断言检查这些字符是否在 关键部分 内，寻找结束引号和冒号。匹配结果总是像 _aBc 或 "aBc（下划线在回调函数的左侧被修剪，" 在应用 ucfirst 后保持不变）。

图案详情：

["_] # one " or _
[^"_]* # zero or more characters that aren't " or _
(?= # open a lookahead assertion (followed with)
    [^"]* # all that isn't a "
    " # a literal "
    \s* # eventual whitespaces
    : # a literal :
) # close the lookahead assertion

没有好的答案，看起来简单或复杂实际上取决于 reader。

PHP - preg_replace_callback 驼峰式

PHP - preg_replace_callback for camelCasing

regex

pcre

camelcasing

preg-replace-callback