当第一个子串后有一个 Space 时，在 Python 中的两个子串之间查找字符串

Question

虽然 Whosebug 上有几篇帖子与此类似，但其中 none 涉及目标字符串是其中一个子字符串后 space 的情况。

我有以下字符串 (example_string)： <insert_randomletters>[?] I want this string.Reduced<insert_randomletters>

我想从上面的字符串中提取 "I want this string."。随机字母将始终更改，但引号 "I want this string." 将始终位于 [?]（在最后一个方括号后带有 space）和 Reduced.

之间

现在，我可以执行以下操作来提取 "I want this string"。

target_quote_object = re.search('[?](.*?)Reduced', example_string)
target_quote_text = target_quote_object.group(1)
print(target_quote_text[2:])

这消除了总是出现在我提取的字符串开头的 ] 和 </code>，因此只打印 "I want this string." 但是，这个解决方案看起来很难看，我宁愿将 <code>re.search() return 作为当前目标字符串而不做任何修改。我该怎么做？

Answer 1

原来的解决方案是：

target_quote_object = re.search('] (.*?)Reduced', example_string)
target_quote_text = target_quote_object.group(1)
print(target_quote_text)

不过，Wiktor 的解决方案更好。

Answer 2

您的 '[?](.*?)Reduced' 模式匹配文字 ?，然后捕获除换行符以外的任何 0+ 个字符，尽可能少直到第一个 Reduced 子字符串。 [?] 是一个 字符 class 由未转义的括号组成，字符 class 中的 ? 是文字 ? 字符。这就是为什么您的第 1 组包含 ] 和 space.

要使您的正则表达式匹配 [?]，您需要转义 [ 和 ?，它们将作为文字字符进行匹配。此外，您需要在 ] 之后添加一个 space 以实际确保它不会落入第 1 组。更好的主意是使用 \s* （0 或更多 whitespaces) 或 \s+（出现 1 次或多次）。

使用

re.search(r'\[\?]\s*(.*?)Reduced', example_string)

参见regex demo。

import re
rx = r"\[\?]\s*(.*?)Reduced"
s = "<insert_randomletters>[?] I want this string.Reduced<insert_randomletters>"
m = re.search(r'\[\?]\s*(.*?)Reduced', s)
if m:
    print(m.group(1))
# => I want this string.

参见Python demo。

Answer 3

正则表达式可能对此不是必需的，前提是您的字符串格式一致：

mystr = '<insert_randomletters>[?] I want this string.Reduced<insert_randomletters>'

res = mystr.split('Reduced')[0].split('] ')[1]

# 'I want this string.'

Answer 4

与其他答案一样，这可能没有必要。或者只是 太 long-winded 对于 Python。此方法使用一种常见的字符串方法 find.

str.find(sub,start,end) 将 return 子串 str[start:end] 或 returns -1 中第一次出现 sub 的索引 如果 none 找到。
在每次迭代中，检索 [?] 的索引，然后检索 Reduced 的索引。打印结果子字符串。
每次 [?]...Reduced 模式被 returned 时，索引都会更新为字符串的其余部分。从该索引继续搜索。

代码

s = ' [?] Nice to meet you.Reduced  efweww  [?] Who are you? Reduced<insert_randomletters>[?] I want this 
string.Reduced<insert_randomletters>'


idx = s.find('[?]')
while idx is not -1:
    start = idx
    end = s.find('Reduced',idx)
    print(s[start+3:end].strip())
    idx = s.find('[?]',end)

输出

$ python splmat.py
Nice to meet you.
Who are you?
I want this string.

Answer 5

您[合作]/[应该]使用正面回顾(?<=\[\?\]) :

import re
pattern=r'(?<=\[\?\])(\s\w.+?)Reduced'

string_data='<insert_randomletters>[?] I want this string.Reduced<insert_randomletters>'

print(re.findall(pattern,string_data)[0].strip())

输出：

I want this string.

当第一个子串后有一个 Space 时，在 Python 中的两个子串之间查找字符串

Find String Between Two Substrings in Python When There is A Space After First Substring

python

regex

string

substring

string-search

代码

输出