为什么这段代码只下载一页的数据?
Why is this code only downloading one page's data?
试过很多次了,还是不行:
import requests
from lxml import html, etree
from selenium import webdriver
import time, json
#how many page do you want to scan
page_numnotint = input("how many page do you want to scan")
page_num = int(page_numnotint)
file_name = 'jd_goods_data.json'
url = 'https://list.jd.com/list.html?cat=1713,3264,3414&page=1&delivery=1&sort=sort_totalsales15_desc&trans=1&JL=4_10_0#J_main'
driver = webdriver.Chrome()
driver.get(url)
base_html = driver.page_source
selctor = etree.HTML(base_html)
date_info = []
name_data, price_data = [], []
jd_goods_data = {}
for q in range(page_num):
i = int(1)
while True:
name_string = '//*[@id="plist"]/ul/li[%d]/div/div[3]/a/em/text()' %(i)
price_string = '//*[@id="plist"]/ul/li[%d]/div/div[2]/strong[1]/i/text()' %(i)
if i == 60:
break
else:
i += 1
name = selctor.xpath(name_string)[0]
name_data.append(name)
price = selctor.xpath(price_string)[0]
price_data.append(price)
jd_goods_data[name] = price
print(name_data)
with open(file_name, 'w') as f:
json.dump(jd_goods_data, f)
time.sleep(2)
driver.find_element_by_xpath('//*[@id="J_bottomPage"]/span[1]/a[10]').click()
time.sleep(2)
# for k, v in jd_goods_data.items():
# print(k,v)
我正在尝试下载一些详细信息,但它不起作用。如果输入 2 进行扫描,它只会下载一页详细信息,但会下载两次!
好的,您定义了 q
,但实际上并没有这样使用它。在这种情况下,约定是将这个未使用的变量命名为 _
。我的意思是,而不是做
for q in range(page_num):
你应该做的
for _ in range(page_num):
这样,其他程序员会直接知道你没有使用q
,只想重复你的操作。
这意味着(由于某些原因)行 driver.find_element_by_xpath('//*[@id="J_bottomPage"]/span[1]/a[10]').click()
没有正确执行。肯定有办法让它发挥作用。但在你的例子中,我试探性地看到你的 url 包含一个名称为 page
的参数。我建议您改用它。因此,这导致实际使用变量 q
,如下所示:
import requests
from lxml import html,etree
from selenium import webdriver
import time, json
#how many page do you want to scan
page_numnotint = input("how many page do you want to scan")
page_num = int(page_numnotint)
file_name = 'jd_goods_data.json'
driver = webdriver.Chrome()
date_info = []
name_data, price_data = [], []
jd_goods_data = {}
for q in range(page_num):
url = 'https://list.jd.com/list.html?cat=1713,3264,3414&page={page}&delivery=1&sort=sort_totalsales15_desc&trans=1&JL=4_10_0#J_main'.format(page=q)
driver.get(url)
base_html = driver.page_source
selctor = etree.HTML(base_html)
i = 1
while True:
name_string = '//*[@id="plist"]/ul/li[%d]/div/div[3]/a/em/text()' %(i)
price_string = '//*[@id="plist"]/ul/li[%d]/div/div[2]/strong[1]/i/text()' %(i)
if i == 60:
break
else:
i += 1
name = selctor.xpath(name_string)[0]
name_data.append(name)
price = selctor.xpath(price_string)[0]
price_data.append(price)
jd_goods_data[name] = price
print(name_data)
with open(file_name, 'w') as f:
json.dump(jd_goods_data, f)
driver.quit()
试过很多次了,还是不行:
import requests
from lxml import html, etree
from selenium import webdriver
import time, json
#how many page do you want to scan
page_numnotint = input("how many page do you want to scan")
page_num = int(page_numnotint)
file_name = 'jd_goods_data.json'
url = 'https://list.jd.com/list.html?cat=1713,3264,3414&page=1&delivery=1&sort=sort_totalsales15_desc&trans=1&JL=4_10_0#J_main'
driver = webdriver.Chrome()
driver.get(url)
base_html = driver.page_source
selctor = etree.HTML(base_html)
date_info = []
name_data, price_data = [], []
jd_goods_data = {}
for q in range(page_num):
i = int(1)
while True:
name_string = '//*[@id="plist"]/ul/li[%d]/div/div[3]/a/em/text()' %(i)
price_string = '//*[@id="plist"]/ul/li[%d]/div/div[2]/strong[1]/i/text()' %(i)
if i == 60:
break
else:
i += 1
name = selctor.xpath(name_string)[0]
name_data.append(name)
price = selctor.xpath(price_string)[0]
price_data.append(price)
jd_goods_data[name] = price
print(name_data)
with open(file_name, 'w') as f:
json.dump(jd_goods_data, f)
time.sleep(2)
driver.find_element_by_xpath('//*[@id="J_bottomPage"]/span[1]/a[10]').click()
time.sleep(2)
# for k, v in jd_goods_data.items():
# print(k,v)
我正在尝试下载一些详细信息,但它不起作用。如果输入 2 进行扫描,它只会下载一页详细信息,但会下载两次!
好的,您定义了 q
,但实际上并没有这样使用它。在这种情况下,约定是将这个未使用的变量命名为 _
。我的意思是,而不是做
for q in range(page_num):
你应该做的
for _ in range(page_num):
这样,其他程序员会直接知道你没有使用q
,只想重复你的操作。
这意味着(由于某些原因)行 driver.find_element_by_xpath('//*[@id="J_bottomPage"]/span[1]/a[10]').click()
没有正确执行。肯定有办法让它发挥作用。但在你的例子中,我试探性地看到你的 url 包含一个名称为 page
的参数。我建议您改用它。因此,这导致实际使用变量 q
,如下所示:
import requests
from lxml import html,etree
from selenium import webdriver
import time, json
#how many page do you want to scan
page_numnotint = input("how many page do you want to scan")
page_num = int(page_numnotint)
file_name = 'jd_goods_data.json'
driver = webdriver.Chrome()
date_info = []
name_data, price_data = [], []
jd_goods_data = {}
for q in range(page_num):
url = 'https://list.jd.com/list.html?cat=1713,3264,3414&page={page}&delivery=1&sort=sort_totalsales15_desc&trans=1&JL=4_10_0#J_main'.format(page=q)
driver.get(url)
base_html = driver.page_source
selctor = etree.HTML(base_html)
i = 1
while True:
name_string = '//*[@id="plist"]/ul/li[%d]/div/div[3]/a/em/text()' %(i)
price_string = '//*[@id="plist"]/ul/li[%d]/div/div[2]/strong[1]/i/text()' %(i)
if i == 60:
break
else:
i += 1
name = selctor.xpath(name_string)[0]
name_data.append(name)
price = selctor.xpath(price_string)[0]
price_data.append(price)
jd_goods_data[name] = price
print(name_data)
with open(file_name, 'w') as f:
json.dump(jd_goods_data, f)
driver.quit()