从网站上抓取特定的 GTAG 值
Scraping a specific GTAG value from a website
我正在尝试抓取网站和 return 他们的 GTM 容器 ID,我找到了一个仅适用于单个特定网站的解决方案。
适用于:(https://www.observepoint.com/)
import urllib3
import re
from bs4 import BeautifulSoup
http = urllib3.PoolManager()
response = http.request('GET', "https://www.observepoint.com/")
soup = BeautifulSoup(response.data,"html.parser")
GTM = soup.head.findAll(text=re.compile(r'GTM'))
print(re.search("GTM-[A-Z0-9]{6,7}",str(GTM))[0])
它不起作用(Returns None 对象类型),即使 GTM id 值仍然存在并且像以前的网站一样位于 same/similar iframe 标记上。
工作脚本的 GTM 值:
网站脚本的 GTM 值在以下设备上不起作用:
import requests
import re
urls = [
"https://www.observepoint.com/",
"https://www.dccomics.com/characters/superman%26sa%3DU%26ved%3D2ahUKEwi55uyMxfHxAhXMp5UCHTkMBekQFjAzegQIARAB%26usg%3DAOvVaw2PgfF7ZT6S6UeZpFImsXDC%2Cdccomics",
]
def main(url):
for url in urls:
r = requests.get(url)
match = re.findall("(GTM-[A-Z0-9]{6,7})", r.text)
if match:
print(set(match))
main("https://www.dccomics.com/characters/superman/")
输出:
{'GTM-5LS3NZ'}
{'GTM-538C4X'}
我正在尝试抓取网站和 return 他们的 GTM 容器 ID,我找到了一个仅适用于单个特定网站的解决方案。
适用于:(https://www.observepoint.com/)
import urllib3
import re
from bs4 import BeautifulSoup
http = urllib3.PoolManager()
response = http.request('GET', "https://www.observepoint.com/")
soup = BeautifulSoup(response.data,"html.parser")
GTM = soup.head.findAll(text=re.compile(r'GTM'))
print(re.search("GTM-[A-Z0-9]{6,7}",str(GTM))[0])
它不起作用(Returns None 对象类型),即使 GTM id 值仍然存在并且像以前的网站一样位于 same/similar iframe 标记上。
工作脚本的 GTM 值:
网站脚本的 GTM 值在以下设备上不起作用:
import requests
import re
urls = [
"https://www.observepoint.com/",
"https://www.dccomics.com/characters/superman%26sa%3DU%26ved%3D2ahUKEwi55uyMxfHxAhXMp5UCHTkMBekQFjAzegQIARAB%26usg%3DAOvVaw2PgfF7ZT6S6UeZpFImsXDC%2Cdccomics",
]
def main(url):
for url in urls:
r = requests.get(url)
match = re.findall("(GTM-[A-Z0-9]{6,7})", r.text)
if match:
print(set(match))
main("https://www.dccomics.com/characters/superman/")
输出:
{'GTM-5LS3NZ'}
{'GTM-538C4X'}