从 <script> 个标签 BeautifulSoup4、请求中提取令牌
Extract token from within <script> tags BeautifulSoup4, Requests
我正在尝试将 securityToken 从 HTML 响应中分离出来。虽然 securityToken 在标签内。
我已经能够使用以下代码隔离标签:
import requests
from bs4 import BeautifulSoup
import re
url = 'https://obe.sandals.com/read-land-availability/'
r = requests.get(url, headers={"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.103 Safari/537.36"})
soup= BeautifulSoup(r.text, 'html.parser')
mytext = soup.find('script', text = re.compile('securityToken:'))
print(mytext)
这是输出,但我不知道提取 securityToken 的最后一步
<script> window._app.page = { jsView: './views/step1/Vacation', securityToken: "BF8394B1DD5481AF43BE2AF02243903F121D26327E83ADC13785F6EF739B5870", subSessionId: "6D71C585C7F51CF105B3100A473635ACF3637329F2C1ABAADB1F2827832562D8", step: 1 }; </script>
Process finished with exit code 0
如果您使用 'html5lib' 而不是 'html.parser',并且安全令牌的位置始终相同:
mytext.split('securityToken: "')[1].split('", subSessionId:')[0]
要提取 securityToken
的值,请尝试以下操作:
import re
import requests
from bs4 import BeautifulSoup
url = 'https://obe.sandals.com/read-land-availability/'
r = requests.get(url, headers={"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.103 Safari/537.36"})
soup = BeautifulSoup(r.text, 'html.parser')
mytext = soup.find('script', text = re.compile('securityToken:'))
print(re.search(r'securityToken: "(.*?)"', str(mytext)).group(1))
输出:
5EFDCE1D62C5F1C1369EF3629F921B0F90301ACB51C5FD24321D7FB58D04DE50
我正在尝试将 securityToken 从 HTML 响应中分离出来。虽然 securityToken 在标签内。
我已经能够使用以下代码隔离标签:
import requests
from bs4 import BeautifulSoup
import re
url = 'https://obe.sandals.com/read-land-availability/'
r = requests.get(url, headers={"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.103 Safari/537.36"})
soup= BeautifulSoup(r.text, 'html.parser')
mytext = soup.find('script', text = re.compile('securityToken:'))
print(mytext)
这是输出,但我不知道提取 securityToken 的最后一步
<script> window._app.page = { jsView: './views/step1/Vacation', securityToken: "BF8394B1DD5481AF43BE2AF02243903F121D26327E83ADC13785F6EF739B5870", subSessionId: "6D71C585C7F51CF105B3100A473635ACF3637329F2C1ABAADB1F2827832562D8", step: 1 }; </script>
Process finished with exit code 0
如果您使用 'html5lib' 而不是 'html.parser',并且安全令牌的位置始终相同:
mytext.split('securityToken: "')[1].split('", subSessionId:')[0]
要提取 securityToken
的值,请尝试以下操作:
import re
import requests
from bs4 import BeautifulSoup
url = 'https://obe.sandals.com/read-land-availability/'
r = requests.get(url, headers={"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.103 Safari/537.36"})
soup = BeautifulSoup(r.text, 'html.parser')
mytext = soup.find('script', text = re.compile('securityToken:'))
print(re.search(r'securityToken: "(.*?)"', str(mytext)).group(1))
输出:
5EFDCE1D62C5F1C1369EF3629F921B0F90301ACB51C5FD24321D7FB58D04DE50