查找仅在末尾可选地包含特定字符的字符串

Question

我想查找其中没有 。 个字符的字符串，并且可以选择在字符串末尾出现此字符。

我搜索了一些提示，诸如此类，但没有解决我的问题。

^(?!\.)(?!.*\.$)(?!.*\.\.)[a-zA-Z0-9_.]+$
(?!\.) - don't allow . at start
(?!.*\.\.) - don't allow 2 consecutive dots
(?!.*\.$) - don't allow . at end

我尝试使用

str_l  = ["aaa。bbb。","aaa。","aaa"]
for str1 in str_l:
  res1 = re.search(r'(.*?!。*$)', str1) #if 。not in string, return True
  res2 = re.search(r'(?<!(。)。$)',str1) # if 。 only appear at the end of string, return True, but not solved
  print(res1,res2)

我想将 res1 和 res2 组合成一个正则表达式，字符串结果类似于 False, True, True.

Answer 1

这可以通过以下代码完成。

import re

p = re.compile("^(?:(?!。).)*(。$)?(?!.*。).*$")

l = [
    "aaa。bbb。",
    "aaa bbb。",  # matches because only at end
    "aaa。bbb",
    "。aaa bbb",
    "aaa bbb",  # matches because none found
]

print([s for s in l if p.match(s)])

这导致：

['aaa bbb。', 'aaa bbb']

可以找到完整的解释 here at regex101.com。

与更简洁的 ^[^。]*。?$ 相比，此匹配表达式的唯一优势是除了给定字符外，它还可以与字符串一起使用。因此，假设您需要匹配可能以“foo”结尾的字符串，但它不应出现在字符串的前面。然后你可以使用 ^(?:(?!foo).)*(foo$)?(?!.*foo).*$.

但是，它慢了大约 60%。您可以在此处查看测试和结果：

import re
import timeit

a = re.compile("^(?:(?!。).)*(。$)?(?!.*。).*$")
b = re.compile("^[^。]*。?$")

l = [
    "aaa。bbb。",
    "aaa bbb。",  # matches because only at end
    "aaa。bbb",
    "。aaa bbb",
    "aaa bbb",  # matches because none found
]

print(
    timeit.timeit(
        "matches = [s for s in l if a.match(s)]",
        setup="from __main__ import (l, a)",
    )
)

print(
    timeit.timeit(
        "matches = [s for s in l if b.match(s)]",
        setup="from __main__ import (l, b)",
    )
)

给出：

2.6208932230000004
1.6510743480000003

Answer 2

你可以使用

import re
str_l  = ["aaa。bbb。","aaa。","aaa"]
for str1 in str_l:
  print(str1, '=>', bool(re.search(r'^[^。]*。?$', str1)))

输出：

# => aaa。bbb。 => False
aaa。 => True
aaa => True

见Python demo。详情:

^ - 字符串开头
[^。]* - 除点
。? - 一个可选的点
$ - 在字符串末尾。

要使用此正则表达式从列表中获取有效字符串，您可以使用

rx = re.compile(r'^[^。]*。?$')
print( list(filter(rx.search, str_l)) )
# => ['aaa。', 'aaa']

Answer 3

另一种方法可以拆分 。

如果您使用 split 并且字符串末尾是 。，则列表中的最后一项将为空。

如果没有出现，则列表大小为 1。

str_l = ["aaa。bbb。", "aaa。", "aaa", "。", "。  ", "。。"]

for str1 in str_l:
    lst = str1.split(r"。")
    nr = len(lst)
    print(f"'{str1}' -> {nr == 1 or nr == 2 and lst[1] == ''}")

输出

'aaa。bbb。' -> False
'aaa。' -> True
'aaa' -> True
'。' -> True
'。  ' -> False
'。。' -> False

看到一个Python demo。

查找仅在末尾可选地包含特定字符的字符串

Find strings only optionally containing a specific char at the end

python

regex

python-3.x