使用 BS4 从 <script> 抓取中提取内容

Extract a content from <script> scrapign with BS4

我正在尝试从“脚本”标签中提取信息,代码如下

    response = requests.get("https://www.zalando.es/jordan-air-jordan-mid-zapatillas-altas-blackdark-beetrootwhitehyper-royal-joc11a024-g11.html?hl=1610800800024", headers=headers)
    soup = BeautifulSoup(response.content, 'html.parser')
 
    marca = soup.find("h3", {"class":"OEhtt9 ka2E9k uMhVZi uc9Eq5 pVrzNP _5Yd-hZ"}).text
    nombre = soup.find("h1", {"class":"OEhtt9 ka2E9k uMhVZi z-oVg8 pVrzNP w5w9i_ _1PY7tW _9YcI4f"}).text
    color = soup.find("span", {"class":"u-6V88 ka2E9k uMhVZi dgII7d z-oVg8 pVrzNP"}).text
    precio = soup.find("span", {"class":"uqkIZw ka2E9k uMhVZi FxZV-M z-oVg8 pVrzNP"}).text
    talla = soup.find("span", {"class":"u-6V88 ka2E9k uMhVZi FxZV-M z-oVg8 pVrzNP"}).text
    imagen = soup.find("img", {"class": "_6uf91T z-oVg8 u-6V88 ka2E9k uMhVZi FxZV-M _2Pvyxl JT3_zV EKabf7 mo6ZnF _1RurXL mo6ZnF PZ5eVw"})['src']


    sku355 = api + str(soup.find_all('script')[15]).split('sku":"')[3][:-137]
    sku36 = api + str(soup.find_all('script')[15]).split('sku":"')[4][:-139]
    sku365 = api + str(soup.find_all('script')[15]).split('sku":"')[5][:-139]
    sku375 = api + str(soup.find_all('script')[15]).split('sku":"')[6][:-137]
    sku38 =  api + str(soup.find_all('script')[15]).split('sku":"')[7][:-139]
    sku385 = api + str(soup.find_all('script')[15]).split('sku":"')[8][:-137]
    sku39 = api + str(soup.find_all('script')[15]).split('sku":"')[9][:-137]
    sku40 = api + str(soup.find_all('script')[15]).split('sku":"')[10][:-139]
    sku405 = api + str(soup.find_all('script')[15]).split('sku":"')[11][:-137]
    sku41 = api + str(soup.find_all('script')[15]).split('sku":"')[12][:-137]
    sku42 = api + str(soup.find_all('script')[15]).split('sku":"')[13][:-139]
    sku425 = api + str(soup.find_all('script')[15]).split('sku":"')[14][:-137]
    sku43 = api + str(soup.find_all('script')[15]).split('sku":"')[15][:-125]

    print (sku3555)
    print (sku36)
    print (sku365)
    print (sku375)
    print (sku38)
    print (sku385)
    print (sku39)
    print (sku40)
    print (sku405)
    print (sku41)
    print (sku42)
    print (sku425)
    print (sku43)

这双鞋的一切都很完美,但是当我切换到这个 link 时,它给了我一些别的东西,我想取出的是每个尺码的 SKU,而不管 link 即

https://www.zalando.es/nike-sportswear-air-force-1-gtx-unisex-zapatillas-anthraciteblackbarely-grey-ni115o01u-q11.html

无法重现您的示例,改进您的问题会很酷。

以防万一

如果您只想抓住尺码,请尝试以下操作:

import requests, json
from bs4 import BeautifulSoup

headers = {"user-agent": "Mozilla/5.0"}
response = requests.get("https://www.zalando.es/jordan-air-jordan-mid-zapatillas-altas-blackdark-beetrootwhitehyper-royal-joc11a024-g11.html?hl=1610800800024", headers=headers)

soup = BeautifulSoup(response.content, 'lxml')

json_object = json.loads(soup.select_one('script#z-vegas-pdp-props').contents[0].split('CDATA')[1].split(']>')[0])

for item in json_object[0]['model']['articleInfo']['units']:
    print('sku:{0} - size:{1}'.format(item['id'],item['size']['local']))

输出

sku:JOC11A024-G110005000 - size:35.5
sku:JOC11A024-G110055000 - size:36
sku:JOC11A024-G110006000 - size:36.5
sku:JOC11A024-G110065000 - size:37.5
sku:JOC11A024-G110007000 - size:38
sku:JOC11A024-G110075000 - size:38.5
sku:JOC11A024-G110008000 - size:39
sku:JOC11A024-G110085000 - size:40
sku:JOC11A024-G110009000 - size:40.5
sku:JOC11A024-G110095000 - size:41
sku:JOC11A024-G110010000 - size:42
sku:JOC11A024-G110105000 - size:42.5
sku:JOC11A024-G110011000 - size:43