PHP - 改进文本大写功能
PHP - Improve text capitalization function
我有一个将字符串大写的函数:
function capitalize_sentence($text)
{
$output = preg_replace_callback('/([.!?])\s*(\w)/', function ($matches) {
return strtoupper($matches[1] . ' ' . $matches[2]);
}, ucfirst(strtolower($text)));
return $output;
}
当我有一个像这样的简单字符串时:
$text = 'hello. this works !';
var_dump($text);
$text = capitalize_sentence($text);
var_dump($text);die;
效果不错:
string 'hello.this works !' (length=18)
string 'Hello. This works !' (length=19)
但在我的代码中,有时字符串看起来像这样(带有一些标签):
$text = '<span>hello.</span> this <b>works</b> !';
var_dump($text);
$text = capitalize_sentence($text);
var_dump($text);die;
这给了我这个(如你所见,第一个单词没有大写...):
string '<span>hello.</span> this <b>works</b> !' (length=39)
string '<span>hello.</span> this <b>works</b> !' (length=39)
如何改进我的代码?我需要 "escape" <tags>
而不删除它们,但像第一个示例一样将第一个单词大写....
我需要这样的输出:
string '<span>Hello.</span> This <b>works</b> !' (length=39)
谢谢!
试试这个更新,我添加了更多条件并稍微更改了替换:
$output = preg_replace_callback('/((?:^|[.!?])(?:<[^>]*?>)?)(\s*)(\w)/', function ($matches) {
return $matches[1] . $matches[2] . strtoupper($matches[3]);
}, ucfirst(strtolower($text)));
输出<span>Hello.</span> This <b>works</b> !
.
试试这个:
function ucSentence($str) {
$len = strlen($str);
$flagNeedUC = TRUE; // start of sentence flag
$flagTag = FALSE; // inside tag flag
$endOfSentence = array('.', '!', '?');
for ($ix = 0; $ix < $len; $ix += 1) {
if ($flagTag) {
if ('>' === $str{$ix}) { // resolve end tag
$flagTag = FALSE;
}
} else {
if (in_array($str{$ix}, $endOfSentence)) { // resolve end sentence
$flagNeedUC = TRUE;
} elseif ('<' === $str{$ix}) { // resolve start tag
$flagTag = TRUE;
} elseif (ctype_alpha($str{$ix}) && $flagNeedUC) { // resolve first char after sentence end
$flagNeedUC = FALSE;
$str{$ix} = strtoupper($str{$ix});
}
}
}
return $str;
}
echo ucSentence('<span><b>hello. </b></span> this <b>works</b> !');
它打印 <span><b>Hello. </b></span> This <b>works</b>
更新 特别是@w35l3y :)
我添加了传递属性值。它识别在野外互联网中出现的几种形式的属性值:
<tag attr="value">
、<tag attr='value'>
和 <tag attr=value attr=value>
function ucSentence($str) {
$len = strlen($str);
$flagNeedUC = TRUE; // start of sentence flag
$flagTag = FALSE; // inside tag flag
$stageAttr = FALSE; // inside attribute value
$endOfSentence = array('.', '!', '?');
for ($ix = 0; $ix < $len; $ix += 1) {
if ($flagTag) {
if ($stageAttr) {
if ('=' === $stageAttr) {
if ('"' === $str{$ix}) {
$stageAttr = '"';
} elseif ('\'' === $str{$ix}) {
$stageAttr = '\'';
} else {
$stageAttr = ' >';
}
} elseif (strpos($stageAttr, $str{$ix}) !== FALSE) {
if ('>' === $str{$ix}) {
$flagTag = FALSE;
}
$stageAttr = FALSE;
}
} else {
if ('>' === $str{$ix}) { // resolve end tag
$flagTag = FALSE;
} elseif ('=' === $str{$ix}) {
$stageAttr = '=';
}
}
} else {
if (in_array($str{$ix}, $endOfSentence)) { // resolve end sentence
$flagNeedUC = TRUE;
} elseif ('<' === $str{$ix}) { // resolve start tag
$flagTag = TRUE;
} elseif (ctype_alpha($str{$ix}) && $flagNeedUC) { // resolve first char after sentence end
$flagNeedUC = FALSE;
$str{$ix} = strtoupper($str{$ix});
}
}
}
return $str;
}
$testArr = array(
'<span><b>hello. </b></span> this <b>works</b> !',
'test. <span title="jane <3 john"> <b>hello. </b></span> this <b>works</b> !',
'test! <span title="hover -> here"> <b>hello. </b></span> this <b>works</b> !',
'test <span title="jane <3 john"> <b>hello. </b></span> this <b>works</b> !',
'test? <span title="hover -> here"> <b>hello. </b></span> this <b>works</b> !',
'test <span title="hover -> here"> <b>hello. </b></span> this <b>works</b> !',
'test. <span title=\'hover -> here\'> <b>hello. </b></span> this <b>works</b> !',
'test. <span title=jane<3john data=jane> <b>hello. </b></span> this <b>works</b> !',
);
foreach ($testArr as $num => $testStr) {
printf("[%d] %s\n", $num, ucSentence($testStr));
}
它打印:
[0] <span><b>Hello. </b></span> This <b>works</b> !
[1] Test. <span title="jane <3 john"> <b>Hello. </b></span> This <b>works</b> !
[2] Test! <span title="hover -> here"> <b>Hello. </b></span> This <b>works</b> !
[3] Test <span title="jane <3 john"> <b>hello. </b></span> This <b>works</b> !
[4] Test? <span title="hover -> here"> <b>Hello. </b></span> This <b>works</b> !
[5] Test <span title="hover -> here"> <b>hello. </b></span> This <b>works</b> !
[6] Test. <span title='hover -> here'> <b>Hello. </b></span> This <b>works</b> !
[7] Test. <span title=jane<3john data=jane> <b>Hello. </b></span> This <b>works</b> !
这是@tutankhamun 的一个略微修改版本,它防止在电子邮件地址或 URL 中的句点后字符大写(或者任何其他时间在句子字符结束后没有 space (. ! ?)
function sentenceCase($str) {
$len = strlen($str);
$flagNeedUC = TRUE; // start of sentence flag
$flagTag = FALSE; // inside tag flag
$stageAttr = FALSE; // inside attribute value
$lastChar = NULL;
$endOfSentence = array('.', '!', '?');
for ($ix = 0; $ix < $len; $ix += 1) {
if ($flagTag) {
if ($stageAttr) {
if ('=' === $stageAttr) {
if ('"' === $str{$ix}) {
$stageAttr = '"';
} elseif ('\'' === $str{$ix}) {
$stageAttr = '\'';
} else {
$stageAttr = ' >';
}
} elseif (strpos($stageAttr, $str{$ix}) !== FALSE) {
if ('>' === $str{$ix}) {
$flagTag = FALSE;
}
$stageAttr = FALSE;
}
} else {
if ('>' === $str{$ix}) { // resolve end tag
$flagTag = FALSE;
} elseif ('=' === $str{$ix}) {
$stageAttr = '=';
}
}
} else {
if (in_array($str{$ix}, $endOfSentence)) { // resolve end sentence
$flagNeedUC = TRUE;
} elseif ('<' === $str{$ix}) { // resolve start tag
$flagTag = TRUE;
} elseif (ctype_alpha($str{$ix}) && $flagNeedUC) { // resolve first char after sentence end
$flagNeedUC = FALSE;
if (!in_array($lastChar, $endOfSentence)) $str{$ix} = strtoupper($str{$ix});
}
}
$lastChar = $str{$ix};
}
return $str;
}
我有一个将字符串大写的函数:
function capitalize_sentence($text)
{
$output = preg_replace_callback('/([.!?])\s*(\w)/', function ($matches) {
return strtoupper($matches[1] . ' ' . $matches[2]);
}, ucfirst(strtolower($text)));
return $output;
}
当我有一个像这样的简单字符串时:
$text = 'hello. this works !';
var_dump($text);
$text = capitalize_sentence($text);
var_dump($text);die;
效果不错:
string 'hello.this works !' (length=18)
string 'Hello. This works !' (length=19)
但在我的代码中,有时字符串看起来像这样(带有一些标签):
$text = '<span>hello.</span> this <b>works</b> !';
var_dump($text);
$text = capitalize_sentence($text);
var_dump($text);die;
这给了我这个(如你所见,第一个单词没有大写...):
string '<span>hello.</span> this <b>works</b> !' (length=39)
string '<span>hello.</span> this <b>works</b> !' (length=39)
如何改进我的代码?我需要 "escape" <tags>
而不删除它们,但像第一个示例一样将第一个单词大写....
我需要这样的输出:
string '<span>Hello.</span> This <b>works</b> !' (length=39)
谢谢!
试试这个更新,我添加了更多条件并稍微更改了替换:
$output = preg_replace_callback('/((?:^|[.!?])(?:<[^>]*?>)?)(\s*)(\w)/', function ($matches) {
return $matches[1] . $matches[2] . strtoupper($matches[3]);
}, ucfirst(strtolower($text)));
输出<span>Hello.</span> This <b>works</b> !
.
试试这个:
function ucSentence($str) {
$len = strlen($str);
$flagNeedUC = TRUE; // start of sentence flag
$flagTag = FALSE; // inside tag flag
$endOfSentence = array('.', '!', '?');
for ($ix = 0; $ix < $len; $ix += 1) {
if ($flagTag) {
if ('>' === $str{$ix}) { // resolve end tag
$flagTag = FALSE;
}
} else {
if (in_array($str{$ix}, $endOfSentence)) { // resolve end sentence
$flagNeedUC = TRUE;
} elseif ('<' === $str{$ix}) { // resolve start tag
$flagTag = TRUE;
} elseif (ctype_alpha($str{$ix}) && $flagNeedUC) { // resolve first char after sentence end
$flagNeedUC = FALSE;
$str{$ix} = strtoupper($str{$ix});
}
}
}
return $str;
}
echo ucSentence('<span><b>hello. </b></span> this <b>works</b> !');
它打印 <span><b>Hello. </b></span> This <b>works</b>
更新 特别是@w35l3y :)
我添加了传递属性值。它识别在野外互联网中出现的几种形式的属性值:
<tag attr="value">
、<tag attr='value'>
和 <tag attr=value attr=value>
function ucSentence($str) {
$len = strlen($str);
$flagNeedUC = TRUE; // start of sentence flag
$flagTag = FALSE; // inside tag flag
$stageAttr = FALSE; // inside attribute value
$endOfSentence = array('.', '!', '?');
for ($ix = 0; $ix < $len; $ix += 1) {
if ($flagTag) {
if ($stageAttr) {
if ('=' === $stageAttr) {
if ('"' === $str{$ix}) {
$stageAttr = '"';
} elseif ('\'' === $str{$ix}) {
$stageAttr = '\'';
} else {
$stageAttr = ' >';
}
} elseif (strpos($stageAttr, $str{$ix}) !== FALSE) {
if ('>' === $str{$ix}) {
$flagTag = FALSE;
}
$stageAttr = FALSE;
}
} else {
if ('>' === $str{$ix}) { // resolve end tag
$flagTag = FALSE;
} elseif ('=' === $str{$ix}) {
$stageAttr = '=';
}
}
} else {
if (in_array($str{$ix}, $endOfSentence)) { // resolve end sentence
$flagNeedUC = TRUE;
} elseif ('<' === $str{$ix}) { // resolve start tag
$flagTag = TRUE;
} elseif (ctype_alpha($str{$ix}) && $flagNeedUC) { // resolve first char after sentence end
$flagNeedUC = FALSE;
$str{$ix} = strtoupper($str{$ix});
}
}
}
return $str;
}
$testArr = array(
'<span><b>hello. </b></span> this <b>works</b> !',
'test. <span title="jane <3 john"> <b>hello. </b></span> this <b>works</b> !',
'test! <span title="hover -> here"> <b>hello. </b></span> this <b>works</b> !',
'test <span title="jane <3 john"> <b>hello. </b></span> this <b>works</b> !',
'test? <span title="hover -> here"> <b>hello. </b></span> this <b>works</b> !',
'test <span title="hover -> here"> <b>hello. </b></span> this <b>works</b> !',
'test. <span title=\'hover -> here\'> <b>hello. </b></span> this <b>works</b> !',
'test. <span title=jane<3john data=jane> <b>hello. </b></span> this <b>works</b> !',
);
foreach ($testArr as $num => $testStr) {
printf("[%d] %s\n", $num, ucSentence($testStr));
}
它打印:
[0] <span><b>Hello. </b></span> This <b>works</b> !
[1] Test. <span title="jane <3 john"> <b>Hello. </b></span> This <b>works</b> !
[2] Test! <span title="hover -> here"> <b>Hello. </b></span> This <b>works</b> !
[3] Test <span title="jane <3 john"> <b>hello. </b></span> This <b>works</b> !
[4] Test? <span title="hover -> here"> <b>Hello. </b></span> This <b>works</b> !
[5] Test <span title="hover -> here"> <b>hello. </b></span> This <b>works</b> !
[6] Test. <span title='hover -> here'> <b>Hello. </b></span> This <b>works</b> !
[7] Test. <span title=jane<3john data=jane> <b>Hello. </b></span> This <b>works</b> !
这是@tutankhamun 的一个略微修改版本,它防止在电子邮件地址或 URL 中的句点后字符大写(或者任何其他时间在句子字符结束后没有 space (. ! ?)
function sentenceCase($str) {
$len = strlen($str);
$flagNeedUC = TRUE; // start of sentence flag
$flagTag = FALSE; // inside tag flag
$stageAttr = FALSE; // inside attribute value
$lastChar = NULL;
$endOfSentence = array('.', '!', '?');
for ($ix = 0; $ix < $len; $ix += 1) {
if ($flagTag) {
if ($stageAttr) {
if ('=' === $stageAttr) {
if ('"' === $str{$ix}) {
$stageAttr = '"';
} elseif ('\'' === $str{$ix}) {
$stageAttr = '\'';
} else {
$stageAttr = ' >';
}
} elseif (strpos($stageAttr, $str{$ix}) !== FALSE) {
if ('>' === $str{$ix}) {
$flagTag = FALSE;
}
$stageAttr = FALSE;
}
} else {
if ('>' === $str{$ix}) { // resolve end tag
$flagTag = FALSE;
} elseif ('=' === $str{$ix}) {
$stageAttr = '=';
}
}
} else {
if (in_array($str{$ix}, $endOfSentence)) { // resolve end sentence
$flagNeedUC = TRUE;
} elseif ('<' === $str{$ix}) { // resolve start tag
$flagTag = TRUE;
} elseif (ctype_alpha($str{$ix}) && $flagNeedUC) { // resolve first char after sentence end
$flagNeedUC = FALSE;
if (!in_array($lastChar, $endOfSentence)) $str{$ix} = strtoupper($str{$ix});
}
}
$lastChar = $str{$ix};
}
return $str;
}