用于从 HTML 元素捕获标记的正则表达式
RegEx for capturing a token from an HTML element
所以我试图从 html 中的对象获取值。我已经找到了如何获取该值,但是其中添加了我不想要的额外内容。
我试过使用 .split() 和组,但是 none 已经做了任何事情。
html = r.text
checkouttoken = re.search('DF_CHECKOUT_TOKEN = (.*?);', html, re.S)
print(checkouttoken.group(0))
预计:
27f37949bb8a76ede81508c8c1b750c8
实际:
< iframe srcdoc="<script>!function(){var e=function(e){var t={exports:{}};return e.call(t.exports,t,t.exports),t.exports},r=function(){fun
DF_CHECKOUT_TOKEN = "27f37949bb8a76ede81508c8c1b750c8";
做group(1)
。 group(0)
是所有匹配的文本,group(1)
是您捕获的第一组。
此外,如果您不想在结果中使用引号,则需要将引号添加到捕获组之外的正则表达式中:'DF_CHECKOUT_TOKEN = "(.*?)";'
我们在这里可能想要的表达式可以很简单:
DF_CHECKOUT_TOKEN = \"(.+?)\"
测试
# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility
import re
regex = r"DF_CHECKOUT_TOKEN = \"(.+?)\""
test_str = "< iframe srcdoc=\"<script>!function(){var e=function(e){var t={exports:{}};return e.call(t.exports,t,t.exports),t.exports},r=function(){fun DF_CHECKOUT_TOKEN = \"27f37949bb8a76ede81508c8c1b750c8\";"
matches = re.finditer(regex, test_str, re.MULTILINE)
for matchNum, match in enumerate(matches, start=1):
print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))
for groupNum in range(0, len(match.groups())):
groupNum = groupNum + 1
print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))
# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.
Demo
所以我试图从 html 中的对象获取值。我已经找到了如何获取该值,但是其中添加了我不想要的额外内容。
我试过使用 .split() 和组,但是 none 已经做了任何事情。
html = r.text
checkouttoken = re.search('DF_CHECKOUT_TOKEN = (.*?);', html, re.S)
print(checkouttoken.group(0))
预计:
27f37949bb8a76ede81508c8c1b750c8
实际:
< iframe srcdoc="<script>!function(){var e=function(e){var t={exports:{}};return e.call(t.exports,t,t.exports),t.exports},r=function(){fun
DF_CHECKOUT_TOKEN = "27f37949bb8a76ede81508c8c1b750c8";
做group(1)
。 group(0)
是所有匹配的文本,group(1)
是您捕获的第一组。
此外,如果您不想在结果中使用引号,则需要将引号添加到捕获组之外的正则表达式中:'DF_CHECKOUT_TOKEN = "(.*?)";'
我们在这里可能想要的表达式可以很简单:
DF_CHECKOUT_TOKEN = \"(.+?)\"
测试
# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility
import re
regex = r"DF_CHECKOUT_TOKEN = \"(.+?)\""
test_str = "< iframe srcdoc=\"<script>!function(){var e=function(e){var t={exports:{}};return e.call(t.exports,t,t.exports),t.exports},r=function(){fun DF_CHECKOUT_TOKEN = \"27f37949bb8a76ede81508c8c1b750c8\";"
matches = re.finditer(regex, test_str, re.MULTILINE)
for matchNum, match in enumerate(matches, start=1):
print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))
for groupNum in range(0, len(match.groups())):
groupNum = groupNum + 1
print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))
# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.