如何使用 re.search 在 'response.text' 中获取 html 中的脚本值？

Question

我有一个HTML文件，这个文件包含几个脚本，特别是在最后一个脚本中包含一个我想得到的值

我需要获取这里找到的哈希值

extend(cur, { "hash": "13334a0e457f0793ec", "loginHost": "login", "sureBoxText": false, "strongCode": 0, "joinParams": false, "validationType": 3, "resendDelay": 120, "calledPhoneLen": 4, "calledPhoneExcludeCountries": [1, 49, 200] });

为此我使用了

import re

with open("test.html", "r", encoding='utf-8') as f:
    html = f.read()

hash = re.search(r'{ "hash": "(.*?)",', html).group(1)

工作完美，但当我尝试直接从请求中执行相同操作时出现错误。

with requests.get(url, headers=headers, cookies=cookies) as response:
        if response.status_code == 200:
            html = response.content
            hash = re.search(r'{ "hash": "(.*?)",', html).group(1)
            return hash

错误

TypeError: cannot use a string pattern on a bytes-like object

然后我进行了一个简单的测试，我将 'response.text' 保存在 html 文件中并尝试以第一种方式读取，错误仍然存在在我输入文件后不久，在我的 vscode 中单击格式化文件，它修复了整个 html 文件，我进行了测试并且成功了。我需要一种从 'response.text' 到 html 进行格式化的方法，这样我可以获得我的价值，或者如果有另一种我不知道我愿意学习的方法。

OBS 哈希值位于 'response.text'

Answer 1

您需要将字节解码为字符串：

re.search(r'{ "hash": "(.*?)",', html.decode('utf-8'))

Answer 2

尝试使用 str() 将 html 转换为字符串：

hash = re.search(r'{ "hash": "(.*?)",', str(html)).group(1)

编辑：您的正则表达式不正确，将其更改为：

hash = re.search(r'"hash":"(.*?)",', str(html)).group(1)

Answer 3

我相信您正在寻找 response.text，即“响应内容，采用 unicode 格式。”。参见 https://2.python-requests.org/en/master/api/#requests.Response.text

with requests.get(url, headers=headers, cookies=cookies) as response:
        if response.status_code == 200:
            html = response.text
            hash = re.search(r'{ "hash": "(.*?)",', html).group(1)
            return hash

如何使用 re.search 在 'response.text' 中获取 html 中的脚本值？

how to get a script value in html in 'response.text' using re.search?

python

beautifulsoup

python-requests

python-re