通过带有 Python 2.7 的字符串进行正则表达式

Question

我目前在 BeautifulSoup4 中处理的 HTML 响应中包含以下内容：

<script type="text/javascript">
var n='eut';
var u='user'+'/8/'+'41140658'+n.charAt(2)+n.charAt(0)+n.charAt(1);
document.getElementById('big_pic').src='http://b2.eu.album.com/'+u.charAt(0)+'/'+u+'.jpg';
</script>

我想要实现的是能够提取成功的字母 ('big_pic').src='http://, 在这种情况下, 字母 'b'

我试过下面的方法，但我不知道如何 return 字符串后面的字母：-

my_string = str(re.findall(r'('big_pic').src='http://', the_string))

如何 return 字符串中 'http://' 之后的字母？

Answer 1

您可以使用正面回顾：

>>> re.search(r"(?<=\('big_pic'\)\.src='http://).", the_string).group(0)
'b'

findall 将 return 一个包含所有匹配项的数组：

>>> re.findall(r"\('big_pic'\)\.src='http://(.)", the_string)
['b']

所以在使用findall时你也应该注意匹配你想要的。

Answer 2

您的实施中有几个错误。

首先，如果您知道要查找的确切字符串，为什么要使用正则表达式？您可以简单地搜索字符串。使用开始字符串的索引和要查找的字符串的长度，您可以简单地检索所需位置的字符。

其次，您在字符串的开头和字符串中的引号都使用了单引号，它甚至不应该运行没有错误（除非错误仅在您将其发布到此处时出现） )

进行这些更改后，您的代码将如下所示：

idx = the_string.find(r"('big_pic').src='http://")
if idx > -1:
        my_string = the_string[idx+24:idx+25]

通过带有 Python 2.7 的字符串进行正则表达式

Regex through a string with Python 2.7

python

regex

beautifulsoup