在没有 Chrome GUI 的情况下抓取 JS 呈现的网站?
Scrape a JS rendered site without Chrome GUI?
我正在尝试使用 selenium 和 BeautifulSoup 抓取 js 呈现的网站。该代码工作正常,但我需要在没有任何 chrome 的服务器上 运行 它。我应该在没有 GUI 的情况下工作的代码中更改什么?
以下是当前代码:
from bs4 import BeautifulSoup
import selenium.webdriver as webdriver
import json
from selenium.webdriver.common.keys import Keys
url = 'https://www.bigbasket.com/pc/fruits-vegetables/fresh-vegetables/?nc=nb'
chromepath = "/Users/Nitin/Desktop/Milkbasket/Scraping/chromedriver"
driver = driver = webdriver.Chrome(chromepath)
driver.get(url)
#rest of code for fetching prices
我建议您放弃 Selenium 方法,并使用内置的 urllib
库或 requests
库来获取您需要的信息。所有产品的信息都可以从返回的 JSON 数据中获取。例如:
import requests
import re
params = {
"type" : "pc",
"slug" : "fresh-vegetables",
"tab_type" : '["all"]',
"sorted_on" : "popularity",
"listtype" : "pc",
}
session = requests.Session()
for page in range(1, 10):
params['page'] = page
req_vegetables = session.get("https://www.bigbasket.com/product/get-products", params=params)
json_vegetables = req_vegetables.json()
print(f'Page {page}')
for product in json_vegetables['tab_info']['product_map']['all']['prods']:
print(f" {product['p_desc']} - {product['sp']} - {product['mrp']}")
这将为您提供以下输出:
Page 1
Onion - 21.00 - 26.25
Potato - 27.00 - 33.75
Tomato - Hybrid - 40.00 - 50.00
Ladies Finger - 10.00 - 12.50
Cauliflower - 35.00 - 43.75
Palak - 30.00 - 37.50
Potato Onion Tomato 1 kg Each - 88.00 - 110.00
Carrot - Local - 59.00 - 73.75
Capsicum - Green - 89.00 - 111.25
Tomato - Local - 47.00 - 58.75
Mushrooms - Button - 49.00 - 61.25
Cucumber - 25.00 - 31.25
Broccoli - 18.40 - 23.00
Bottle Gourd - 17.00 - 21.25
Cabbage - 32.00 - 40.00
Cucumber - English - 23.00 - 28.75
Tomato - Local, Organically Grown - 29.00 - 36.25
Brinjal - Bottle Shape - 72.00 - 90.00
Onion - Organically Grown - 23.00 - 28.75
Methi - 19.00 - 23.75
Page 2
Bitter Gourd / Karela - 59.20 - 74.00
Beetroot - 40.00 - 50.00
Fresho Palak - Without Root 250 Gm + Amul Malai Paneer 200 Gm - 94.20 - 102.00
Capsicum - Red - 299.00 - 373.75
... etc
我正在尝试使用 selenium 和 BeautifulSoup 抓取 js 呈现的网站。该代码工作正常,但我需要在没有任何 chrome 的服务器上 运行 它。我应该在没有 GUI 的情况下工作的代码中更改什么?
以下是当前代码:
from bs4 import BeautifulSoup
import selenium.webdriver as webdriver
import json
from selenium.webdriver.common.keys import Keys
url = 'https://www.bigbasket.com/pc/fruits-vegetables/fresh-vegetables/?nc=nb'
chromepath = "/Users/Nitin/Desktop/Milkbasket/Scraping/chromedriver"
driver = driver = webdriver.Chrome(chromepath)
driver.get(url)
#rest of code for fetching prices
我建议您放弃 Selenium 方法,并使用内置的 urllib
库或 requests
库来获取您需要的信息。所有产品的信息都可以从返回的 JSON 数据中获取。例如:
import requests
import re
params = {
"type" : "pc",
"slug" : "fresh-vegetables",
"tab_type" : '["all"]',
"sorted_on" : "popularity",
"listtype" : "pc",
}
session = requests.Session()
for page in range(1, 10):
params['page'] = page
req_vegetables = session.get("https://www.bigbasket.com/product/get-products", params=params)
json_vegetables = req_vegetables.json()
print(f'Page {page}')
for product in json_vegetables['tab_info']['product_map']['all']['prods']:
print(f" {product['p_desc']} - {product['sp']} - {product['mrp']}")
这将为您提供以下输出:
Page 1
Onion - 21.00 - 26.25
Potato - 27.00 - 33.75
Tomato - Hybrid - 40.00 - 50.00
Ladies Finger - 10.00 - 12.50
Cauliflower - 35.00 - 43.75
Palak - 30.00 - 37.50
Potato Onion Tomato 1 kg Each - 88.00 - 110.00
Carrot - Local - 59.00 - 73.75
Capsicum - Green - 89.00 - 111.25
Tomato - Local - 47.00 - 58.75
Mushrooms - Button - 49.00 - 61.25
Cucumber - 25.00 - 31.25
Broccoli - 18.40 - 23.00
Bottle Gourd - 17.00 - 21.25
Cabbage - 32.00 - 40.00
Cucumber - English - 23.00 - 28.75
Tomato - Local, Organically Grown - 29.00 - 36.25
Brinjal - Bottle Shape - 72.00 - 90.00
Onion - Organically Grown - 23.00 - 28.75
Methi - 19.00 - 23.75
Page 2
Bitter Gourd / Karela - 59.20 - 74.00
Beetroot - 40.00 - 50.00
Fresho Palak - Without Root 250 Gm + Amul Malai Paneer 200 Gm - 94.20 - 102.00
Capsicum - Red - 299.00 - 373.75
... etc