Python - 正则表达式 (Re.Escape, Re.Findall);如何:查找子字符串+字符串中超出子字符串的字符数?
Python - Regex (Re.Escape, Re.Findall); How To: Find sub-strings + a number of characters beyond the sub-strings within a string?
这可能是一个简单的问题。我正在学习如何使用正则表达式,但在对字符串执行特定任务时遇到问题。
例如:
example_string = ";一,一;二,二;三,三;四,四"
desired_output = ["One, o", "Two, t", "Three, t", "Four, f"] #list输出正常
通过以下,我可以得到 ["One" , "Two" , "Three"]:
def findStringsInMiddle(a, b, text):
return re.findall(re.escape(a)+"(.*?)"+re.escape(b),text)
desired_output = findStringsInMiddle('; ' , ',' , example_string)
但我无法弄清楚如何正确配置它以获取我也想要的逗号+space+any_type_of_character。
有什么建议吗?
谢谢!
您可以设置完整模式(从分号到逗号后的第二个字母)并标记要提取的组:
>>> s = "; One, one; Two, two; Three, three; Four, four"
>>> re.findall(r"; (.*?,.{2})", s)
['One, o', 'Two, t', 'Three, t', 'Four, f']
这里有一个解决方案:
example_string = "; One, one; Two, two; Three, three; Four, four"
def findStringsInMiddle(text):
return re.findall("; (.+?, [a-z])",text)
desired_output = findStringsInMiddle(example_string)
desired_output
输出:
['One, o', 'Two, t', 'Three, t', 'Four, f']
您可以通过包含右侧分隔符并附加可选的 (?:\s*.)?
组来稍微重新组织模式:
def findStringsInMiddle(a, b, text):
return re.findall(re.escape(a)+"(.*?"+re.escape(b) + r"(?:\s*.)?)",text, flags=re.S)
该模式看起来像 ;(.*?,(?:\s*.)?)
(参见 the regex demo)并且匹配:
;
- 左侧分隔符
(.*?,(?:\s*.)?)
- 第 1 组:
.*?
- 任意零个或多个字符,尽可能少
,
- 逗号
(?:\s*.)?
- 可选的非捕获组匹配 1 次或 0 次出现的 0+ 空格,然后是任何字符。
注意我添加了 re.S
标志来使 .
也匹配换行符。
import re
example_string = "; One, one; Two, two; Three, three; Four, four"
desired_output = ["One, o", "Two, t", "Three, t", "Four, f"] #list output is OK
def findStringsInMiddle(a, b, text):
return re.findall(re.escape(a)+"(.*?"+re.escape(b) + r"(?:\s*.)?)",text, flags=re.S)
desired_output = findStringsInMiddle('; ' , ',' , example_string)
print(desired_output)
# => ['One, o', 'Two, t', 'Three, t', 'Four, f']
import re
example_string = "; One, one; Two, two; Three, three; Four, four"
pattern = re.compile(r";\s" # The search string must start with a semoicolon and then a space character
r"([A-Z][a-z]+,\s.?)" # Here is the capturing group, containing first a capital letter,
# some lowercase letters
# and finally a comma, space and zero or one characters
)
print(re.findall(pattern,
example_string
)
)
输出:
['One, o', 'Two, t', 'Three, t', 'Four, f']
这可能是一个简单的问题。我正在学习如何使用正则表达式,但在对字符串执行特定任务时遇到问题。
例如:
example_string = ";一,一;二,二;三,三;四,四"
desired_output = ["One, o", "Two, t", "Three, t", "Four, f"] #list输出正常
通过以下,我可以得到 ["One" , "Two" , "Three"]:
def findStringsInMiddle(a, b, text):
return re.findall(re.escape(a)+"(.*?)"+re.escape(b),text)
desired_output = findStringsInMiddle('; ' , ',' , example_string)
但我无法弄清楚如何正确配置它以获取我也想要的逗号+space+any_type_of_character。
有什么建议吗?
谢谢!
您可以设置完整模式(从分号到逗号后的第二个字母)并标记要提取的组:
>>> s = "; One, one; Two, two; Three, three; Four, four"
>>> re.findall(r"; (.*?,.{2})", s)
['One, o', 'Two, t', 'Three, t', 'Four, f']
这里有一个解决方案:
example_string = "; One, one; Two, two; Three, three; Four, four"
def findStringsInMiddle(text):
return re.findall("; (.+?, [a-z])",text)
desired_output = findStringsInMiddle(example_string)
desired_output
输出:
['One, o', 'Two, t', 'Three, t', 'Four, f']
您可以通过包含右侧分隔符并附加可选的 (?:\s*.)?
组来稍微重新组织模式:
def findStringsInMiddle(a, b, text):
return re.findall(re.escape(a)+"(.*?"+re.escape(b) + r"(?:\s*.)?)",text, flags=re.S)
该模式看起来像 ;(.*?,(?:\s*.)?)
(参见 the regex demo)并且匹配:
;
- 左侧分隔符(.*?,(?:\s*.)?)
- 第 1 组:.*?
- 任意零个或多个字符,尽可能少
,
- 逗号(?:\s*.)?
- 可选的非捕获组匹配 1 次或 0 次出现的 0+ 空格,然后是任何字符。
注意我添加了 re.S
标志来使 .
也匹配换行符。
import re
example_string = "; One, one; Two, two; Three, three; Four, four"
desired_output = ["One, o", "Two, t", "Three, t", "Four, f"] #list output is OK
def findStringsInMiddle(a, b, text):
return re.findall(re.escape(a)+"(.*?"+re.escape(b) + r"(?:\s*.)?)",text, flags=re.S)
desired_output = findStringsInMiddle('; ' , ',' , example_string)
print(desired_output)
# => ['One, o', 'Two, t', 'Three, t', 'Four, f']
import re
example_string = "; One, one; Two, two; Three, three; Four, four"
pattern = re.compile(r";\s" # The search string must start with a semoicolon and then a space character
r"([A-Z][a-z]+,\s.?)" # Here is the capturing group, containing first a capital letter,
# some lowercase letters
# and finally a comma, space and zero or one characters
)
print(re.findall(pattern,
example_string
)
)
输出:
['One, o', 'Two, t', 'Three, t', 'Four, f']