PHP:在第一个不是价格小数点或字符串最后一个字符的句点拆分字符串
PHP: Split a string at the first period that isn't the decimal point in a price or the last character of the string
我想按照标题中列出的参数拆分字符串。我尝试了一些不同的方法,包括使用 preg_match,但到目前为止收效甚微,我觉得可能有一个更简单的解决方案,但我还没有找到。
我有一个与标题中提到的“价格”匹配的正则表达式(见下文)。
/(?=.)\£(([1-9][0-9]{0,2}(,[0-9]{3})*)|[0-9]+)?(\.[0-9]{1,2})?/
下面是一些示例场景以及我想要的结果:
示例 1:
input: "This string should not split as the only periods that appear are here £19.99 and also at the end."
output: n/a
示例 2:
input: "This string should split right here. As the period is not part of a price or at the end of the string."
output: "This string should split right here"
示例 3:
input: "There is a price in this string £19.99, but it should only split at this point. As I want it to ignore periods in a price"
output: "There is a price in this string £19.99, but it should only split at this point"
我建议使用
preg_split('~\£(?:[1-9]\d{0,2}(?:,\d{3})*|[0-9]+)?(?:\.\d{1,2})?(*SKIP)(*F)|\.(?!\s*$)~u', $string)
参见regex demo。
该模式与您的模式匹配,\£(?:[1-9]\d{0,2}(?:,\d{3})*|[0-9]+)?(?:\.\d{1,2})?
和 跳过 它与 (*SKIP)(*F)
,否则,它匹配 non-final .
与 \.(?!\s*$)
(即使有尾随空白字符)。
如果您真的只需要在第一次出现符合条件的点时拆分,您可以使用匹配方法:
preg_match('~^((?:\£(?:[1-9]\d{0,2}(?:,\d{3})*|[0-9]+)?(?:\.\d{1,2})?|[^.])+)\.(.*)~su', $string, $match)
见regex demo。这里,
^
- 匹配字符串开始位置
((?:\£(?:[1-9]\d{0,2}(?:,\d{3})*|[0-9]+)?(?:\.\d{1,2})?|[^.])+)
- 出现一次或多次您的货币模式或 .
字符 以外的任何一个字符
\.
- 一个 .
字符
(.*)
- 第 2 组:字符串的其余部分。
您可以简单地使用这个正则表达式:
\.
由于第一句话后只有 space(而不是价格),所以这应该也可以,对吧?
要将文本拆分成句子,避免出现不同的陷阱,例如数字中的点或千位分隔符以及一些缩写(例如 etc.
),最好的工具是 intlBreakIterator
旨在处理自然语言:
$str = 'There is a price in this string £19.99, but it should only split at this point. As I want it to ignore periods in a price';
$si = IntlBreakIterator::createSentenceInstance('en-US');
$si->setText($str);
$si->next();
echo substr($str, 0, $si->current());
IntlBreakIterator::createSentenceInstance
returns 一个迭代器,给出字符串中不同句子的索引。
它也考虑了 ?
、!
和 ...
。除了数字或价格陷阱外,它也适用于这种字符串:
$str = 'John Smith, Jr. was running naked through the garden crying "catch me! catch me!", but no one was chasing him. His psychatre looked at him from the window with a circumspect eye.';
更多关于 IntlBreakIterator
here.
使用的规则
我想按照标题中列出的参数拆分字符串。我尝试了一些不同的方法,包括使用 preg_match,但到目前为止收效甚微,我觉得可能有一个更简单的解决方案,但我还没有找到。
我有一个与标题中提到的“价格”匹配的正则表达式(见下文)。
/(?=.)\£(([1-9][0-9]{0,2}(,[0-9]{3})*)|[0-9]+)?(\.[0-9]{1,2})?/
下面是一些示例场景以及我想要的结果:
示例 1:
input: "This string should not split as the only periods that appear are here £19.99 and also at the end."
output: n/a
示例 2:
input: "This string should split right here. As the period is not part of a price or at the end of the string."
output: "This string should split right here"
示例 3:
input: "There is a price in this string £19.99, but it should only split at this point. As I want it to ignore periods in a price"
output: "There is a price in this string £19.99, but it should only split at this point"
我建议使用
preg_split('~\£(?:[1-9]\d{0,2}(?:,\d{3})*|[0-9]+)?(?:\.\d{1,2})?(*SKIP)(*F)|\.(?!\s*$)~u', $string)
参见regex demo。
该模式与您的模式匹配,\£(?:[1-9]\d{0,2}(?:,\d{3})*|[0-9]+)?(?:\.\d{1,2})?
和 跳过 它与 (*SKIP)(*F)
,否则,它匹配 non-final .
与 \.(?!\s*$)
(即使有尾随空白字符)。
如果您真的只需要在第一次出现符合条件的点时拆分,您可以使用匹配方法:
preg_match('~^((?:\£(?:[1-9]\d{0,2}(?:,\d{3})*|[0-9]+)?(?:\.\d{1,2})?|[^.])+)\.(.*)~su', $string, $match)
见regex demo。这里,
^
- 匹配字符串开始位置((?:\£(?:[1-9]\d{0,2}(?:,\d{3})*|[0-9]+)?(?:\.\d{1,2})?|[^.])+)
- 出现一次或多次您的货币模式或.
字符 以外的任何一个字符
\.
- 一个.
字符(.*)
- 第 2 组:字符串的其余部分。
您可以简单地使用这个正则表达式:
\.
由于第一句话后只有 space(而不是价格),所以这应该也可以,对吧?
要将文本拆分成句子,避免出现不同的陷阱,例如数字中的点或千位分隔符以及一些缩写(例如 etc.
),最好的工具是 intlBreakIterator
旨在处理自然语言:
$str = 'There is a price in this string £19.99, but it should only split at this point. As I want it to ignore periods in a price';
$si = IntlBreakIterator::createSentenceInstance('en-US');
$si->setText($str);
$si->next();
echo substr($str, 0, $si->current());
IntlBreakIterator::createSentenceInstance
returns 一个迭代器,给出字符串中不同句子的索引。
它也考虑了 ?
、!
和 ...
。除了数字或价格陷阱外,它也适用于这种字符串:
$str = 'John Smith, Jr. was running naked through the garden crying "catch me! catch me!", but no one was chasing him. His psychatre looked at him from the window with a circumspect eye.';
更多关于 IntlBreakIterator
here.