使用 BS4 从 <script> 抓取中提取内容
Extract a content from <script> scrapign with BS4
我正在尝试从“脚本”标签中提取信息,代码如下
response = requests.get("https://www.zalando.es/jordan-air-jordan-mid-zapatillas-altas-blackdark-beetrootwhitehyper-royal-joc11a024-g11.html?hl=1610800800024", headers=headers)
soup = BeautifulSoup(response.content, 'html.parser')
marca = soup.find("h3", {"class":"OEhtt9 ka2E9k uMhVZi uc9Eq5 pVrzNP _5Yd-hZ"}).text
nombre = soup.find("h1", {"class":"OEhtt9 ka2E9k uMhVZi z-oVg8 pVrzNP w5w9i_ _1PY7tW _9YcI4f"}).text
color = soup.find("span", {"class":"u-6V88 ka2E9k uMhVZi dgII7d z-oVg8 pVrzNP"}).text
precio = soup.find("span", {"class":"uqkIZw ka2E9k uMhVZi FxZV-M z-oVg8 pVrzNP"}).text
talla = soup.find("span", {"class":"u-6V88 ka2E9k uMhVZi FxZV-M z-oVg8 pVrzNP"}).text
imagen = soup.find("img", {"class": "_6uf91T z-oVg8 u-6V88 ka2E9k uMhVZi FxZV-M _2Pvyxl JT3_zV EKabf7 mo6ZnF _1RurXL mo6ZnF PZ5eVw"})['src']
sku355 = api + str(soup.find_all('script')[15]).split('sku":"')[3][:-137]
sku36 = api + str(soup.find_all('script')[15]).split('sku":"')[4][:-139]
sku365 = api + str(soup.find_all('script')[15]).split('sku":"')[5][:-139]
sku375 = api + str(soup.find_all('script')[15]).split('sku":"')[6][:-137]
sku38 = api + str(soup.find_all('script')[15]).split('sku":"')[7][:-139]
sku385 = api + str(soup.find_all('script')[15]).split('sku":"')[8][:-137]
sku39 = api + str(soup.find_all('script')[15]).split('sku":"')[9][:-137]
sku40 = api + str(soup.find_all('script')[15]).split('sku":"')[10][:-139]
sku405 = api + str(soup.find_all('script')[15]).split('sku":"')[11][:-137]
sku41 = api + str(soup.find_all('script')[15]).split('sku":"')[12][:-137]
sku42 = api + str(soup.find_all('script')[15]).split('sku":"')[13][:-139]
sku425 = api + str(soup.find_all('script')[15]).split('sku":"')[14][:-137]
sku43 = api + str(soup.find_all('script')[15]).split('sku":"')[15][:-125]
print (sku3555)
print (sku36)
print (sku365)
print (sku375)
print (sku38)
print (sku385)
print (sku39)
print (sku40)
print (sku405)
print (sku41)
print (sku42)
print (sku425)
print (sku43)
这双鞋的一切都很完美,但是当我切换到这个 link 时,它给了我一些别的东西,我想取出的是每个尺码的 SKU,而不管 link 即
无法重现您的示例,改进您的问题会很酷。
以防万一
如果您只想抓住尺码,请尝试以下操作:
import requests, json
from bs4 import BeautifulSoup
headers = {"user-agent": "Mozilla/5.0"}
response = requests.get("https://www.zalando.es/jordan-air-jordan-mid-zapatillas-altas-blackdark-beetrootwhitehyper-royal-joc11a024-g11.html?hl=1610800800024", headers=headers)
soup = BeautifulSoup(response.content, 'lxml')
json_object = json.loads(soup.select_one('script#z-vegas-pdp-props').contents[0].split('CDATA')[1].split(']>')[0])
for item in json_object[0]['model']['articleInfo']['units']:
print('sku:{0} - size:{1}'.format(item['id'],item['size']['local']))
输出
sku:JOC11A024-G110005000 - size:35.5
sku:JOC11A024-G110055000 - size:36
sku:JOC11A024-G110006000 - size:36.5
sku:JOC11A024-G110065000 - size:37.5
sku:JOC11A024-G110007000 - size:38
sku:JOC11A024-G110075000 - size:38.5
sku:JOC11A024-G110008000 - size:39
sku:JOC11A024-G110085000 - size:40
sku:JOC11A024-G110009000 - size:40.5
sku:JOC11A024-G110095000 - size:41
sku:JOC11A024-G110010000 - size:42
sku:JOC11A024-G110105000 - size:42.5
sku:JOC11A024-G110011000 - size:43
我正在尝试从“脚本”标签中提取信息,代码如下
response = requests.get("https://www.zalando.es/jordan-air-jordan-mid-zapatillas-altas-blackdark-beetrootwhitehyper-royal-joc11a024-g11.html?hl=1610800800024", headers=headers)
soup = BeautifulSoup(response.content, 'html.parser')
marca = soup.find("h3", {"class":"OEhtt9 ka2E9k uMhVZi uc9Eq5 pVrzNP _5Yd-hZ"}).text
nombre = soup.find("h1", {"class":"OEhtt9 ka2E9k uMhVZi z-oVg8 pVrzNP w5w9i_ _1PY7tW _9YcI4f"}).text
color = soup.find("span", {"class":"u-6V88 ka2E9k uMhVZi dgII7d z-oVg8 pVrzNP"}).text
precio = soup.find("span", {"class":"uqkIZw ka2E9k uMhVZi FxZV-M z-oVg8 pVrzNP"}).text
talla = soup.find("span", {"class":"u-6V88 ka2E9k uMhVZi FxZV-M z-oVg8 pVrzNP"}).text
imagen = soup.find("img", {"class": "_6uf91T z-oVg8 u-6V88 ka2E9k uMhVZi FxZV-M _2Pvyxl JT3_zV EKabf7 mo6ZnF _1RurXL mo6ZnF PZ5eVw"})['src']
sku355 = api + str(soup.find_all('script')[15]).split('sku":"')[3][:-137]
sku36 = api + str(soup.find_all('script')[15]).split('sku":"')[4][:-139]
sku365 = api + str(soup.find_all('script')[15]).split('sku":"')[5][:-139]
sku375 = api + str(soup.find_all('script')[15]).split('sku":"')[6][:-137]
sku38 = api + str(soup.find_all('script')[15]).split('sku":"')[7][:-139]
sku385 = api + str(soup.find_all('script')[15]).split('sku":"')[8][:-137]
sku39 = api + str(soup.find_all('script')[15]).split('sku":"')[9][:-137]
sku40 = api + str(soup.find_all('script')[15]).split('sku":"')[10][:-139]
sku405 = api + str(soup.find_all('script')[15]).split('sku":"')[11][:-137]
sku41 = api + str(soup.find_all('script')[15]).split('sku":"')[12][:-137]
sku42 = api + str(soup.find_all('script')[15]).split('sku":"')[13][:-139]
sku425 = api + str(soup.find_all('script')[15]).split('sku":"')[14][:-137]
sku43 = api + str(soup.find_all('script')[15]).split('sku":"')[15][:-125]
print (sku3555)
print (sku36)
print (sku365)
print (sku375)
print (sku38)
print (sku385)
print (sku39)
print (sku40)
print (sku405)
print (sku41)
print (sku42)
print (sku425)
print (sku43)
这双鞋的一切都很完美,但是当我切换到这个 link 时,它给了我一些别的东西,我想取出的是每个尺码的 SKU,而不管 link 即
无法重现您的示例,改进您的问题会很酷。
以防万一
如果您只想抓住尺码,请尝试以下操作:
import requests, json
from bs4 import BeautifulSoup
headers = {"user-agent": "Mozilla/5.0"}
response = requests.get("https://www.zalando.es/jordan-air-jordan-mid-zapatillas-altas-blackdark-beetrootwhitehyper-royal-joc11a024-g11.html?hl=1610800800024", headers=headers)
soup = BeautifulSoup(response.content, 'lxml')
json_object = json.loads(soup.select_one('script#z-vegas-pdp-props').contents[0].split('CDATA')[1].split(']>')[0])
for item in json_object[0]['model']['articleInfo']['units']:
print('sku:{0} - size:{1}'.format(item['id'],item['size']['local']))
输出
sku:JOC11A024-G110005000 - size:35.5
sku:JOC11A024-G110055000 - size:36
sku:JOC11A024-G110006000 - size:36.5
sku:JOC11A024-G110065000 - size:37.5
sku:JOC11A024-G110007000 - size:38
sku:JOC11A024-G110075000 - size:38.5
sku:JOC11A024-G110008000 - size:39
sku:JOC11A024-G110085000 - size:40
sku:JOC11A024-G110009000 - size:40.5
sku:JOC11A024-G110095000 - size:41
sku:JOC11A024-G110010000 - size:42
sku:JOC11A024-G110105000 - size:42.5
sku:JOC11A024-G110011000 - size:43