使用 re 和 requests 从脚本标签中提取 JSON

Question

我正在尝试使用正则表达式从脚本标签中提取 json，但无法匹配 - 但是，我的模式适用于 https://regex101.com/（使用页面要匹配的来源）。

import requests
import re

req = requests.get(myURL)
matches = re.findall("/reports=\[([^]]+)\]/g", req.text)
print(matches)

json 的开头如下所示：

/*! jQuery v1.10.2 | (c) 2005, 2013 jQuery Foundation, Inc. | jquery.org/license
//@ sourceMappingURL=jquery.min.map
*/
(function(e,t){var n,r,i=typeof t,o=e.location,a=e.document,s=a.documentElement,l=e.jQuery,u=e.$,c={},p=...
var reports=[
  {
    "Id": "ddb56456-ae7e-46da-8251-97630e1536f7",

关于我做错了什么的任何指示？如果我将 req.text 写入文本文件然后将其复制到 regex101.com 我可以使用上面相同的模式匹配它。

Answer 1

当您像这样指定字符串时，不要使用“斜线”分隔符。另外，findall 隐含了“g”标志。只需使用：

matches = re.findall("reports=\[([^]]+)\]", req.text)

使用 re 和 requests 从脚本标签中提取 JSON

Extracting JSON from script tag using re and requests

python

python-re