通过数组的值交换变量的值,但在条件下
exchanging values of a variable, by values of an array, but under condition
我有一个代码将输出与数组的值进行比较,并且只终止与数组中的单词的操作:
第一个代码(只是一个例子)
$myVar = 'essa pizza é muito gostosa, que prato de bom sabor';
$myWords=array(
array('sabor','gosto','delicia'),
array('saborosa','gostosa','deliciosa'),
);
foreach($myWords as $words){
shuffle($words); // randomize the subarray
// pipe-together the words and return just one match
if(preg_match('/\K\b(?:'.implode('|',$words).')\b/',$myVar,$out)){
// generate "replace_pair" from matched word and a random remaining subarray word
// replace and preserve the new sentence
$myVar=strtr($myVar,[$out[0]=>current(array_diff($words,$out))]);
}
}
echo $myVar;
我的问题:
我有第二个代码,它不适用于 rand/shuffle(我不想要随机数,我想要精确的替换,我总是从第 0 列更改到第 1 列),总是交换值:
// wrong output: $myVar = "minha irmã alanné é not aquela blnode, elere é a bom plperito";
$myVar = "my sister alannis is not that blonde, here is a good place";
$myWords=array(array("is","é"),
array("on","no"),
array("that","aquela"),
//array("blonde","loira"),
//array("not","não"),
array("sister","irmã"),
array("my","minha"),
//array("nothing","nada"),
array("myth","mito"),
array("he","ele"),
array("good","bom"),
array("ace","perito"),
// array("here","aqui"), //if [here] it does not exist, it is not to do replacement from the line he=ele = "elere" non-existent word
);
$replacements = array_combine(array_column($myWords,0),array_column($myWords,1));
$myVar = strtr($myVar,$replacements);
echo $myVar;
// expected output: minha irmã alannis é not aquela blonde, here é a bom place
// avoid replace words slice!
预期 输出:minha irmã alannis é not aquela blond, here é a bom place
// avoid replace words slice! always check if the word exists in the array before making the substitution.
alanné, blnode, elere, plperito
它检查输出是否是数组 myWords 中存在的真实单词,这避免了键入错误,例如:
那4个字不是一个存在的字,写错了。你是怎么做到第二个代码的?
简而言之,交换必须通过一个完整的词/键,一个现有的词来进行。而不是使用关键字片段创建奇怪的东西!
不幸的是 strtr()
是这项工作的错误工具,因为它是 "word boundary ignorant"。要定位整个单词,没有比使用带单词边界的正则表达式模式更简单的方法了。
此外,为了确保较长的字符串先于较短的字符串(可能存在于其他字符串中的字符串)进行匹配,您必须 $myWords
按字符串长度排序(降序/最长到最短;使用多-仅在必要时使用字节版本)。
单词数组排序并转换为单独的正则表达式模式后,您可以将数组输入 preg_replace()
的 pattern
和 replace
参数。
代码 (Demo)
$myVar = "my sister alannis is not that blonde, here is a good place";
$myWords=array(
array("is","é"),
array("on","no"),
array("that","aquela"),
array("sister","irmã"),
array("my","minha"),
array("myth","mito"),
array("he","ele"),
array("good","bom"),
array("ace","perito")
);
usort($myWords,function($a,$b){return mb_strlen($b[0])<=>mb_strlen($a[0]);}); // sort subarrays by first column multibyte length
// remove mb_ if first column holds no multi-byte characters. strlen() is much faster.
foreach($myWords as &$words){
$words[0]='/\b'.$words[0].'\b/i'; // generate patterns using search word, word boundaries, and case-insensitivity
}
//var_export($myWords);
//var_export(array_column($myWords,0));
//var_export(array_column($myWords,1));
$myVar=preg_replace(array_column($myWords,0),array_column($myWords,1),$myVar);
echo $myVar;
输出:
minha irmã alannis é not aquela blonde, here é a bom place
这不会识别匹配子字符串的大小写。我的意思是,my
和 My
都将替换为 minha
。
为了适应不同的大小写,您需要使用preg_replace_callback()
。
这是考虑因素(处理首字母大写的单词,而不是全部大写的单词):
代码(Demo) <-- 运行这个可以看到更换后保留的原壳。
foreach($myWords as $words){
$myVar=preg_replace_callback(
$words[0],
function($m)use($words){
return ctype_upper(mb_substr($m[0],0,1))?
mb_strtoupper(mb_substr($words[1],0,1)).mb_strtolower(mb_substr($words[1],1)):
$words[1];
},
$myVar);
}
echo $myVar;
我以前的方法非常低效。我没有意识到你正在处理多少数据,但如果我们超过 4000 行,那么效率就至关重要(我想我的大脑一直在思考 strtr()
基于你之前的问题的相关处理).这是我的 new/improved 解决方案,我希望将我以前的解决方案抛在脑后。
代码:(Demo)
$myVar = "My sister alannis Is not That blonde, here is a good place. I know Ariane is not MY SISTER!";
echo "$myVar\n";
$myWords = [
["is", "é"],
["on", "no"],
["that", "aquela"],
["sister", "irmã"],
["my", "minha"],
["myth", "mito"],
["he", "ele"],
["good", "bom"],
["ace", "perito"],
["i", "eu"] // notice I must be lowercase
];
$translations = array_column($myWords, 1, 0); // or skip this step and just declare $myWords as key-value pairs
// length sorting is not necessary
// preg_quote() and \Q\E are not used because dealing with words only (no danger of misinterpretation by regex)
$pattern = '/\b(?>' . implode('|', array_keys($translations)) . ')\b/i'; // atomic group is slightly faster (no backtracking)
/* echo $pattern;
makes: /\b(?>is|on|that|sister|my|myth|he|good|ace)\b/i
demo: https://regex101.com/r/DXTtDf/1
*/
$translated = preg_replace_callback(
$pattern,
function($m) use($translations) { // bring $translations (lookup) array to function
$encoding = 'UTF-8'; // default setting
$key = mb_strtolower($m[0], $encoding); // standardize keys' case for lookup accessibility
if (ctype_lower($m[0])) { // treat as all lower
return $translations[$m[0]];
} elseif (mb_strlen($m[0], $encoding) > 1 && ctype_upper($m[0])) { // treat as all uppercase
return mb_strtoupper($translations[$key], $encoding);
} else { // treat as only first character uppercase
return mb_strtoupper(mb_substr($translations[$key], 0, 1, $encoding), $encoding) // uppercase first
. mb_substr($translations[$key], 1, mb_strlen($translations[$key], $encoding) - 1, $encoding); // append remaining lowercase
}
},
$myVar
);
echo $translated;
输出:
My sister alannis Is not That blonde, here is a good place. I know Ariane is not MY SISTER!
Minha irmã alannis É not Aquela blonde, here é a bom place. Eu know Ariane é not MINHA IRMÃ!
此方法:
- 只1通过
$myVar
,而不是$myWords
. 的每个子数组通过1次
- 不会对查找数组进行排序 (
$myWords
/$translations
)。
- 不会为正则表达式转义 (
preg_quote()
) 或使模式组件成为文字 (\Q..\E
) 而烦恼,因为只有 个单词 被翻译。
- 使用单词边界,以便仅替换完整的单词匹配项。
- 使用原子组作为微优化,在保持准确性的同时拒绝回溯。
- 声明
$encoding
稳定性/可维护性/可重用性的价值。
- 匹配不区分大小写但替换为区分大小写...如果英文匹配是:
- 全部小写,替换也是如此
- 全部大写(且大于单个字符),替换也是如此
- 大写(多字符串的首字符),替换也是如此
我有一个代码将输出与数组的值进行比较,并且只终止与数组中的单词的操作:
第一个代码(只是一个例子)
$myVar = 'essa pizza é muito gostosa, que prato de bom sabor';
$myWords=array(
array('sabor','gosto','delicia'),
array('saborosa','gostosa','deliciosa'),
);
foreach($myWords as $words){
shuffle($words); // randomize the subarray
// pipe-together the words and return just one match
if(preg_match('/\K\b(?:'.implode('|',$words).')\b/',$myVar,$out)){
// generate "replace_pair" from matched word and a random remaining subarray word
// replace and preserve the new sentence
$myVar=strtr($myVar,[$out[0]=>current(array_diff($words,$out))]);
}
}
echo $myVar;
我的问题:
我有第二个代码,它不适用于 rand/shuffle(我不想要随机数,我想要精确的替换,我总是从第 0 列更改到第 1 列),总是交换值:
// wrong output: $myVar = "minha irmã alanné é not aquela blnode, elere é a bom plperito";
$myVar = "my sister alannis is not that blonde, here is a good place";
$myWords=array(array("is","é"),
array("on","no"),
array("that","aquela"),
//array("blonde","loira"),
//array("not","não"),
array("sister","irmã"),
array("my","minha"),
//array("nothing","nada"),
array("myth","mito"),
array("he","ele"),
array("good","bom"),
array("ace","perito"),
// array("here","aqui"), //if [here] it does not exist, it is not to do replacement from the line he=ele = "elere" non-existent word
);
$replacements = array_combine(array_column($myWords,0),array_column($myWords,1));
$myVar = strtr($myVar,$replacements);
echo $myVar;
// expected output: minha irmã alannis é not aquela blonde, here é a bom place
// avoid replace words slice!
预期 输出:minha irmã alannis é not aquela blond, here é a bom place
// avoid replace words slice! always check if the word exists in the array before making the substitution.
alanné, blnode, elere, plperito
它检查输出是否是数组 myWords 中存在的真实单词,这避免了键入错误,例如:
那4个字不是一个存在的字,写错了。你是怎么做到第二个代码的?
简而言之,交换必须通过一个完整的词/键,一个现有的词来进行。而不是使用关键字片段创建奇怪的东西!
不幸的是 strtr()
是这项工作的错误工具,因为它是 "word boundary ignorant"。要定位整个单词,没有比使用带单词边界的正则表达式模式更简单的方法了。
此外,为了确保较长的字符串先于较短的字符串(可能存在于其他字符串中的字符串)进行匹配,您必须 $myWords
按字符串长度排序(降序/最长到最短;使用多-仅在必要时使用字节版本)。
单词数组排序并转换为单独的正则表达式模式后,您可以将数组输入 preg_replace()
的 pattern
和 replace
参数。
代码 (Demo)
$myVar = "my sister alannis is not that blonde, here is a good place";
$myWords=array(
array("is","é"),
array("on","no"),
array("that","aquela"),
array("sister","irmã"),
array("my","minha"),
array("myth","mito"),
array("he","ele"),
array("good","bom"),
array("ace","perito")
);
usort($myWords,function($a,$b){return mb_strlen($b[0])<=>mb_strlen($a[0]);}); // sort subarrays by first column multibyte length
// remove mb_ if first column holds no multi-byte characters. strlen() is much faster.
foreach($myWords as &$words){
$words[0]='/\b'.$words[0].'\b/i'; // generate patterns using search word, word boundaries, and case-insensitivity
}
//var_export($myWords);
//var_export(array_column($myWords,0));
//var_export(array_column($myWords,1));
$myVar=preg_replace(array_column($myWords,0),array_column($myWords,1),$myVar);
echo $myVar;
输出:
minha irmã alannis é not aquela blonde, here é a bom place
这不会识别匹配子字符串的大小写。我的意思是,my
和 My
都将替换为 minha
。
为了适应不同的大小写,您需要使用preg_replace_callback()
。
这是考虑因素(处理首字母大写的单词,而不是全部大写的单词):
代码(Demo) <-- 运行这个可以看到更换后保留的原壳。
foreach($myWords as $words){
$myVar=preg_replace_callback(
$words[0],
function($m)use($words){
return ctype_upper(mb_substr($m[0],0,1))?
mb_strtoupper(mb_substr($words[1],0,1)).mb_strtolower(mb_substr($words[1],1)):
$words[1];
},
$myVar);
}
echo $myVar;
我以前的方法非常低效。我没有意识到你正在处理多少数据,但如果我们超过 4000 行,那么效率就至关重要(我想我的大脑一直在思考 strtr()
基于你之前的问题的相关处理).这是我的 new/improved 解决方案,我希望将我以前的解决方案抛在脑后。
代码:(Demo)
$myVar = "My sister alannis Is not That blonde, here is a good place. I know Ariane is not MY SISTER!";
echo "$myVar\n";
$myWords = [
["is", "é"],
["on", "no"],
["that", "aquela"],
["sister", "irmã"],
["my", "minha"],
["myth", "mito"],
["he", "ele"],
["good", "bom"],
["ace", "perito"],
["i", "eu"] // notice I must be lowercase
];
$translations = array_column($myWords, 1, 0); // or skip this step and just declare $myWords as key-value pairs
// length sorting is not necessary
// preg_quote() and \Q\E are not used because dealing with words only (no danger of misinterpretation by regex)
$pattern = '/\b(?>' . implode('|', array_keys($translations)) . ')\b/i'; // atomic group is slightly faster (no backtracking)
/* echo $pattern;
makes: /\b(?>is|on|that|sister|my|myth|he|good|ace)\b/i
demo: https://regex101.com/r/DXTtDf/1
*/
$translated = preg_replace_callback(
$pattern,
function($m) use($translations) { // bring $translations (lookup) array to function
$encoding = 'UTF-8'; // default setting
$key = mb_strtolower($m[0], $encoding); // standardize keys' case for lookup accessibility
if (ctype_lower($m[0])) { // treat as all lower
return $translations[$m[0]];
} elseif (mb_strlen($m[0], $encoding) > 1 && ctype_upper($m[0])) { // treat as all uppercase
return mb_strtoupper($translations[$key], $encoding);
} else { // treat as only first character uppercase
return mb_strtoupper(mb_substr($translations[$key], 0, 1, $encoding), $encoding) // uppercase first
. mb_substr($translations[$key], 1, mb_strlen($translations[$key], $encoding) - 1, $encoding); // append remaining lowercase
}
},
$myVar
);
echo $translated;
输出:
My sister alannis Is not That blonde, here is a good place. I know Ariane is not MY SISTER!
Minha irmã alannis É not Aquela blonde, here é a bom place. Eu know Ariane é not MINHA IRMÃ!
此方法:
- 只1通过
$myVar
,而不是$myWords
. 的每个子数组通过1次
- 不会对查找数组进行排序 (
$myWords
/$translations
)。 - 不会为正则表达式转义 (
preg_quote()
) 或使模式组件成为文字 (\Q..\E
) 而烦恼,因为只有 个单词 被翻译。 - 使用单词边界,以便仅替换完整的单词匹配项。
- 使用原子组作为微优化,在保持准确性的同时拒绝回溯。
- 声明
$encoding
稳定性/可维护性/可重用性的价值。 - 匹配不区分大小写但替换为区分大小写...如果英文匹配是:
- 全部小写,替换也是如此
- 全部大写(且大于单个字符),替换也是如此
- 大写(多字符串的首字符),替换也是如此