代码未提取页面上的所有产品 URL
Code not extracting all products URLs on a page
我下面的代码是为了提取页面列表中页面上的所有产品 url。我正在抓取的网站是 javascript 网站。我的代码在网站的所有其他产品类别页面上都能完美运行。
但是,在此页面上它只提取了 36 个产品,这是加载到页面上的产品数量。 pages 变量在列表中,因为我试图通过像这样遍历所有页面来提取产品 url
pages = ['https://www.mrphome.com/en_za/shop/kitchen-dining/shop-dining/table-linen', 'https://www.mrphome.com/en_za/shop/kitchen-dining/shop-dining/table-linen?p=2-', 'https://www.mrphome.com/en_za/shop/kitchen-dining/shop-dining/table-linen?p=3-', 'https://www.mrphome.com/en_za/shop/kitchen-dining/shop-dining/table-linen?p=4-', 'https://www.mrphome.com/en_za/shop/kitchen-dining/shop-dining/table-linen?p=5-']
但是,如果我 运行 这样的代码,它仍然 returns 列表中有 36 个项目。
import requests
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np
import time
baseurl = "https://www.mrphome.com/"
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.90 Safari/537.36"
}
produrlslug = []
pages = ['https://www.mrphome.com/en_za/shop/kitchen-dining/shop-dining/table-linen']
for page in pages:
content = requests.get(page, headers=headers)
soup = BeautifulSoup(content.content, "lxml")
url = soup.findAll('a', class_='product-image quickview-enabled')
for item in url:
produrlslug.append(item['href'])
print(len(produrlslug))
如有任何帮助,我们将不胜感激。
缩进迭代 url
的第二个 for 循环解决了问题。
import requests
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np
import time
baseurl = "https://www.mrphome.com/"
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.90 Safari/537.36"
}
produrlslug = []
pages = ['https://www.mrphome.com/en_za/shop/kitchen-dining/shop-dining/table-linen']
for page in pages:
content = requests.get(page, headers=headers)
soup = BeautifulSoup(content.content, "lxml")
url = soup.findAll('a', class_='product-image quickview-enabled')
# indentation missing
for item in url:
produrlslug.append(item['href'])
print(len(produrlslug))
我下面的代码是为了提取页面列表中页面上的所有产品 url。我正在抓取的网站是 javascript 网站。我的代码在网站的所有其他产品类别页面上都能完美运行。
但是,在此页面上它只提取了 36 个产品,这是加载到页面上的产品数量。 pages 变量在列表中,因为我试图通过像这样遍历所有页面来提取产品 url
pages = ['https://www.mrphome.com/en_za/shop/kitchen-dining/shop-dining/table-linen', 'https://www.mrphome.com/en_za/shop/kitchen-dining/shop-dining/table-linen?p=2-', 'https://www.mrphome.com/en_za/shop/kitchen-dining/shop-dining/table-linen?p=3-', 'https://www.mrphome.com/en_za/shop/kitchen-dining/shop-dining/table-linen?p=4-', 'https://www.mrphome.com/en_za/shop/kitchen-dining/shop-dining/table-linen?p=5-']
但是,如果我 运行 这样的代码,它仍然 returns 列表中有 36 个项目。
import requests
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np
import time
baseurl = "https://www.mrphome.com/"
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.90 Safari/537.36"
}
produrlslug = []
pages = ['https://www.mrphome.com/en_za/shop/kitchen-dining/shop-dining/table-linen']
for page in pages:
content = requests.get(page, headers=headers)
soup = BeautifulSoup(content.content, "lxml")
url = soup.findAll('a', class_='product-image quickview-enabled')
for item in url:
produrlslug.append(item['href'])
print(len(produrlslug))
如有任何帮助,我们将不胜感激。
缩进迭代 url
的第二个 for 循环解决了问题。
import requests
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np
import time
baseurl = "https://www.mrphome.com/"
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.90 Safari/537.36"
}
produrlslug = []
pages = ['https://www.mrphome.com/en_za/shop/kitchen-dining/shop-dining/table-linen']
for page in pages:
content = requests.get(page, headers=headers)
soup = BeautifulSoup(content.content, "lxml")
url = soup.findAll('a', class_='product-image quickview-enabled')
# indentation missing
for item in url:
produrlslug.append(item['href'])
print(len(produrlslug))