匹配两个或多个不同的字符
matching two or more characters that are not the same
是否可以编写一个正则表达式模式来匹配 abc
,其中每个字母都不是字面意思,但意味着像 xyz
(但不是 xxy
)这样的文本会被匹配?我能够达到 (.)(?!)
以匹配 ab
中的 a
但后来我被难住了。
在得到下面的答案后,我能够编写一个例程来生成这个模式。使用原始 re
模式比将模式和文本都转换为规范形式然后比较它们要快得多。
def pat2re(p, know=None, wild=None):
"""return a compiled re pattern that will find pattern `p`
in which each different character should find a different
character in a string. Characters to be taken literally
or that can represent any character should be given as
`know` and `wild`, respectively.
EXAMPLES
========
Characters in the pattern denote different characters to
be matched; characters that are the same in the pattern
must be the same in the text:
>>> pat = pat2re('abba')
>>> assert pat.search('maccaw')
>>> assert not pat.search('busses')
The underlying pattern of the re object can be seen
with the pattern property:
>>> pat.pattern
'(.)(?!\1)(.)\2\1'
If some characters are to be taken literally, list them
as known; do the same if some characters can stand for
any character (i.e. are wildcards):
>>> a_ = pat2re('ab', know='a')
>>> assert a_.search('ad') and not a_.search('bc')
>>> ab_ = pat2re('ab*', know='ab', wild='*')
>>> assert ab_.search('abc') and ab_.search('abd')
>>> assert not ab_.search('bad')
"""
import re
# make a canonical "hash" of the pattern
# with ints representing pattern elements that
# must be unique and strings for wild or known
# values
m = {}
j = 1
know = know or ''
wild = wild or ''
for c in p:
if c in know:
m[c] = '\.' if c == '.' else c
elif c in wild:
m[c] = '.'
elif c not in m:
m[c] = j
j += 1
assert j < 100
h = tuple(m[i] for i in p)
# build pattern
out = []
last = 0
for i in h:
if type(i) is int:
if i <= last:
out.append(r'\%s' % i)
else:
if last:
ors = '|'.join(r'\%s' % i for i in range(1, last + 1))
out.append('(?!%s)(.)' % ors)
else:
out.append('(.)')
last = i
else:
out.append(i)
return re.compile(''.join(out))
你可以试试:
^(.)(?!)(.)(?!|).$
下面是对正则表达式模式的解释:
^ from the start of the string
(.) match and capture any first character (no restrictions so far)
(?!) then assert that the second character is different from the first
(.) match and capture any (legitimate) second character
(?!|) then assert that the third character does not match first or second
. match any valid third character
$ end of string
是否可以编写一个正则表达式模式来匹配 abc
,其中每个字母都不是字面意思,但意味着像 xyz
(但不是 xxy
)这样的文本会被匹配?我能够达到 (.)(?!)
以匹配 ab
中的 a
但后来我被难住了。
在得到下面的答案后,我能够编写一个例程来生成这个模式。使用原始 re
模式比将模式和文本都转换为规范形式然后比较它们要快得多。
def pat2re(p, know=None, wild=None):
"""return a compiled re pattern that will find pattern `p`
in which each different character should find a different
character in a string. Characters to be taken literally
or that can represent any character should be given as
`know` and `wild`, respectively.
EXAMPLES
========
Characters in the pattern denote different characters to
be matched; characters that are the same in the pattern
must be the same in the text:
>>> pat = pat2re('abba')
>>> assert pat.search('maccaw')
>>> assert not pat.search('busses')
The underlying pattern of the re object can be seen
with the pattern property:
>>> pat.pattern
'(.)(?!\1)(.)\2\1'
If some characters are to be taken literally, list them
as known; do the same if some characters can stand for
any character (i.e. are wildcards):
>>> a_ = pat2re('ab', know='a')
>>> assert a_.search('ad') and not a_.search('bc')
>>> ab_ = pat2re('ab*', know='ab', wild='*')
>>> assert ab_.search('abc') and ab_.search('abd')
>>> assert not ab_.search('bad')
"""
import re
# make a canonical "hash" of the pattern
# with ints representing pattern elements that
# must be unique and strings for wild or known
# values
m = {}
j = 1
know = know or ''
wild = wild or ''
for c in p:
if c in know:
m[c] = '\.' if c == '.' else c
elif c in wild:
m[c] = '.'
elif c not in m:
m[c] = j
j += 1
assert j < 100
h = tuple(m[i] for i in p)
# build pattern
out = []
last = 0
for i in h:
if type(i) is int:
if i <= last:
out.append(r'\%s' % i)
else:
if last:
ors = '|'.join(r'\%s' % i for i in range(1, last + 1))
out.append('(?!%s)(.)' % ors)
else:
out.append('(.)')
last = i
else:
out.append(i)
return re.compile(''.join(out))
你可以试试:
^(.)(?!)(.)(?!|).$
下面是对正则表达式模式的解释:
^ from the start of the string
(.) match and capture any first character (no restrictions so far)
(?!) then assert that the second character is different from the first
(.) match and capture any (legitimate) second character
(?!|) then assert that the third character does not match first or second
. match any valid third character
$ end of string