Python 正则表达式模块中的简单案例折叠与完整案例折叠
Simple case folding vs full case folding in Python regex module
这是我要问的模块:https://pypi.org/project/regex/,它是 Matthew Barnett 的 regex
。
在项目描述页面中,V0 和 V1 之间的行为差异表述为(注意粗体部分):
Old vs new behaviour
In order to be compatible with the re
module, this module has 2
behaviours:
Version 0 behaviour (old behaviour, compatible with the re module):
Please note that the re module’s behaviour may change over time, and
I’ll endeavour to match that behaviour in version 0.
- Indicated by the
VERSION0
or V0
flag, or (?V0)
in the pattern.
- Case-insensitive matches in Unicode use simple case-folding by
default.
Version 1 behaviour (new behaviour, possibly different from
the re module):
- Indicated by the
VERSION1
or V1
flag, or (?V1)
in the pattern.
- Case-insensitive matches in Unicode use full case-folding by default.
If no version is specified, the regex module will default to regex.DEFAULT_VERSION
.
我自己试了几个例子,但没弄清楚它的作用:
Python 3.6.7 (default, Oct 22 2018, 11:32:17)
[GCC 8.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import regex
>>> r = regex.compile("(?V0i)и")
>>> r
regex.Regex('(?V0i)и', flags=regex.I | regex.V0)
>>> r.search("И")
<regex.Match object; span=(0, 1), match='И'>
>>> regex.search("(?V0i)é", "É")
<regex.Match object; span=(0, 1), match='É'>
>>> regex.search("(?V0i)é", "E")
>>> regex.search("(?V1i)é", "E")
简单折叠和完整折叠有什么区别?或者您能否提供一个示例,其中(不区分大小写的)正则表达式匹配 V1 中的某些内容但不匹配 V0 中的内容?
它在 Unicode case folding table 之后。摘录:
# The entries in this file are in the following machine-readable format:
#
# <code>; <status>; <mapping>; # <name>
#
# The status field is:
# C: common case folding, common mappings shared by both simple and full mappings.
# F: full case folding, mappings that cause strings to grow in length. Multiple characters are separated by spaces.
# S: simple case folding, mappings to single characters where different from F.
[...]
# Usage:
# A. To do a simple case folding, use the mappings with status C + S.
# B. To do a full case folding, use the mappings with status C + F.
只有少数特殊字符的折叠方式不同,例子很小,大写的拉丁升号:
00DF; F; 0073 0073; # LATIN SMALL LETTER SHARP S
[...]
1E9E; F; 0073 0073; # LATIN CAPITAL LETTER SHARP S
1E9E; S; 00DF; # LATIN CAPITAL LETTER SHARP S
这是我要问的模块:https://pypi.org/project/regex/,它是 Matthew Barnett 的 regex
。
在项目描述页面中,V0 和 V1 之间的行为差异表述为(注意粗体部分):
Old vs new behaviour
In order to be compatible with the
re
module, this module has 2 behaviours:
Version 0 behaviour (old behaviour, compatible with the re module):
Please note that the re module’s behaviour may change over time, and I’ll endeavour to match that behaviour in version 0.
- Indicated by the
VERSION0
orV0
flag, or(?V0)
in the pattern.- Case-insensitive matches in Unicode use simple case-folding by default.
Version 1 behaviour (new behaviour, possibly different from the re module):
- Indicated by the
VERSION1
orV1
flag, or(?V1)
in the pattern.- Case-insensitive matches in Unicode use full case-folding by default.
If no version is specified, the regex module will default to
regex.DEFAULT_VERSION
.
我自己试了几个例子,但没弄清楚它的作用:
Python 3.6.7 (default, Oct 22 2018, 11:32:17)
[GCC 8.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import regex
>>> r = regex.compile("(?V0i)и")
>>> r
regex.Regex('(?V0i)и', flags=regex.I | regex.V0)
>>> r.search("И")
<regex.Match object; span=(0, 1), match='И'>
>>> regex.search("(?V0i)é", "É")
<regex.Match object; span=(0, 1), match='É'>
>>> regex.search("(?V0i)é", "E")
>>> regex.search("(?V1i)é", "E")
简单折叠和完整折叠有什么区别?或者您能否提供一个示例,其中(不区分大小写的)正则表达式匹配 V1 中的某些内容但不匹配 V0 中的内容?
它在 Unicode case folding table 之后。摘录:
# The entries in this file are in the following machine-readable format:
#
# <code>; <status>; <mapping>; # <name>
#
# The status field is:
# C: common case folding, common mappings shared by both simple and full mappings.
# F: full case folding, mappings that cause strings to grow in length. Multiple characters are separated by spaces.
# S: simple case folding, mappings to single characters where different from F.
[...]
# Usage:
# A. To do a simple case folding, use the mappings with status C + S.
# B. To do a full case folding, use the mappings with status C + F.
只有少数特殊字符的折叠方式不同,例子很小,大写的拉丁升号:
00DF; F; 0073 0073; # LATIN SMALL LETTER SHARP S
[...]
1E9E; F; 0073 0073; # LATIN CAPITAL LETTER SHARP S
1E9E; S; 00DF; # LATIN CAPITAL LETTER SHARP S