以下 selenium 代码包含 xpath 错误,但它给出语法错误并且没有输出,可以解决吗?
Following selenium code contain xpath error but it is giving syntax error and no output, can it be resolved?
我的网络抓取程序不断出现语法错误且无输出。我的 xpath 是正确的,因为它指向正确的名称,但我没有得到任何输出。该网站是 https://www.ikea.com/sa/en/search/?query=chair&pageNumber=1 。有人可以帮忙吗?
我有 python 3.4.4,我正在使用 visual studio 代码作为 GUI。我正在尝试从宜家网站获取商品名称作为网络抓取代码。但我一直有错误。有人可以帮忙吗?
import selenium
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait, Select
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import NoSuchElementException, StaleElementReferenceException, WebDriverException
import csv
import os
driver= webdriver.Chrome("C:/Python34/Scripts/chromedriver.exe")
driver.get("https://www.ikea.com/sa/en/search/?query=chair&pageNumber=1")
title =driver.findElement(By.XPath("//span[@class='prodName prodNameTro']")).text()
print(title)
预期输出:
RENBERGET
HÄRÖ / FEJAN
ÄPPLARÖ
TÄRENDÖ / ADDE
AGAM
ÄPPLARÖ
这些是页面上项目的名称
由于页面上有多个产品,您可以将所有值存储在一个列表中,然后打印这些值。
您可以像这样执行它:
driver= webdriver.Chrome("C:/Python34/Scripts/chromedriver.exe")
driver.get("https://www.ikea.com/sa/en/search/?query=chair&pageNumber=1")
productNames = driver.find_elements_by_xpath("//span[contains(@id,'txtNameProduct')]")
for product in productNames:
print (product.text)
要使用 python 抓取页面,您不需要 Selenium。使用起来更快更容易,例如 requests and beautifulsoap
这是 ikea 上搜索椅子的基本示例代码,您将在几秒钟内从所有页面中获得所有椅子 (723):
import requests
from bs4 import BeautifulSoup
# array of all items
result = []
# request first page of query. "query=chair&pageNumber=1"
response = requests.get('https://www.ikea.com/sa/en/search/?query=chair&pageNumber=1')
# assert for response is OK
assert response.ok
# parse response test using html.parser
page = BeautifulSoup(response.text, "html.parser")
# get last page number and convert to integer.
last_page_number = int(page.select_one(".pagination a:last-child").text)
# iterate throw from 1 to 30 pages
for i in range(1, last_page_number + 1):
# if i==1 skip request again, because we already get response for the first page
if i > 1:
# request using i as parameter
response = requests.get(f'https://www.ikea.com/sa/en/search/?query=chair&pageNumber={str(i)}')
assert response.ok
page = BeautifulSoup(response.text, "html.parser")
# get all products containers, that contains name, price and description
products = page.select("#productsTable .parentContainer")
# iterate throw all products in the page. get name, price and description and add to result as map
for product in products:
name = product.select_one(".prodName").text.strip()
desc = product.select_one(".prodDesc").text.strip()
price = product.select_one(".prodPrice,.prodNlpTroPrice").text.strip()
result.append({"name": name, "desc": desc, "price": price})
# print results, you can do anything..
for r in result:
print(f"name: {r['name']}, price: {r['price']}, description: {r['desc']}")
print("the end")
我的网络抓取程序不断出现语法错误且无输出。我的 xpath 是正确的,因为它指向正确的名称,但我没有得到任何输出。该网站是 https://www.ikea.com/sa/en/search/?query=chair&pageNumber=1 。有人可以帮忙吗?
我有 python 3.4.4,我正在使用 visual studio 代码作为 GUI。我正在尝试从宜家网站获取商品名称作为网络抓取代码。但我一直有错误。有人可以帮忙吗?
import selenium
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait, Select
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import NoSuchElementException, StaleElementReferenceException, WebDriverException
import csv
import os
driver= webdriver.Chrome("C:/Python34/Scripts/chromedriver.exe")
driver.get("https://www.ikea.com/sa/en/search/?query=chair&pageNumber=1")
title =driver.findElement(By.XPath("//span[@class='prodName prodNameTro']")).text()
print(title)
预期输出:
RENBERGET
HÄRÖ / FEJAN
ÄPPLARÖ
TÄRENDÖ / ADDE
AGAM
ÄPPLARÖ
这些是页面上项目的名称
由于页面上有多个产品,您可以将所有值存储在一个列表中,然后打印这些值。
您可以像这样执行它:
driver= webdriver.Chrome("C:/Python34/Scripts/chromedriver.exe")
driver.get("https://www.ikea.com/sa/en/search/?query=chair&pageNumber=1")
productNames = driver.find_elements_by_xpath("//span[contains(@id,'txtNameProduct')]")
for product in productNames:
print (product.text)
要使用 python 抓取页面,您不需要 Selenium。使用起来更快更容易,例如 requests and beautifulsoap
这是 ikea 上搜索椅子的基本示例代码,您将在几秒钟内从所有页面中获得所有椅子 (723):
import requests
from bs4 import BeautifulSoup
# array of all items
result = []
# request first page of query. "query=chair&pageNumber=1"
response = requests.get('https://www.ikea.com/sa/en/search/?query=chair&pageNumber=1')
# assert for response is OK
assert response.ok
# parse response test using html.parser
page = BeautifulSoup(response.text, "html.parser")
# get last page number and convert to integer.
last_page_number = int(page.select_one(".pagination a:last-child").text)
# iterate throw from 1 to 30 pages
for i in range(1, last_page_number + 1):
# if i==1 skip request again, because we already get response for the first page
if i > 1:
# request using i as parameter
response = requests.get(f'https://www.ikea.com/sa/en/search/?query=chair&pageNumber={str(i)}')
assert response.ok
page = BeautifulSoup(response.text, "html.parser")
# get all products containers, that contains name, price and description
products = page.select("#productsTable .parentContainer")
# iterate throw all products in the page. get name, price and description and add to result as map
for product in products:
name = product.select_one(".prodName").text.strip()
desc = product.select_one(".prodDesc").text.strip()
price = product.select_one(".prodPrice,.prodNlpTroPrice").text.strip()
result.append({"name": name, "desc": desc, "price": price})
# print results, you can do anything..
for r in result:
print(f"name: {r['name']}, price: {r['price']}, description: {r['desc']}")
print("the end")