Python：提取字符串中标签之间的所有子字符串

Question

我有一个大字符串，格式如下：

'324/;.ke5 efwef dwe,werwrf <>i want this<> ergy;'56,> thu ;lokr<>i want this<> htur ;''\> htur> jur'

我知道我可以按照以下方式做一些事情：

result= text.partition('<>')[-1].rpartition('<>')[0]

但这只会给我字符串中第一个 <> 和最后一个 <> 之间的内容，我如何遍历整个字符串并提取每个 <> <> 标记对之间的内容？

Answer 1

您可以使用正则表达式和findall():

>>> import re
>>> s = "324/;.ke5 efwef dwe,werwrf <>i want this<> ergy;'56,> thu ;lokr<>i want this<> htur ;''\> htur> jur"
>>> re.findall(r"<>(.*?)<>", s)
['i want this', 'i want this']

其中 (.*?) 是一个捕获组，可以在 non-greedy 模式下匹配任何字符任意次数。

Answer 2

我认为string.split()是你想要的：

>>> text = """'324/;.ke5 efwef dwe,werwrf <>i want this<> ergy;'56,> thu ;lokr<>i want this<> htur ;''\> htur> jur'"""
>>> print text.split('<>')[1:-1]
['i want this', " ergy;'56%,> thu ;lokr", 'i want this']

split() 方法为您提供一个字符串列表，其中参数用作分隔符。 (https://docs.python.org/2/library/string.html#string.split) 然后，[1:-1] 给你一个没有第一个和最后一个元素的列表片段。

Python：提取字符串中标签之间的所有子字符串

Python: extract all sub-strings in between tags within string

html

python

tags

string

parsing