selenium.common.exceptions.InvalidArgumentException:消息:在遍历 url 列表并作为参数传递给 get() 时参数无效
selenium.common.exceptions.InvalidArgumentException: Message: invalid argument while iterating through a list of urls and passing as argument to get()
我正在抓取一个页面以获取 URL,然后使用它们抓取大量信息。我想避免一直复制和粘贴,但我找不到如何使 get() 与对象一起工作。我的代码的第一部分运行良好,但是当我到达试图获取 url 的部分时,我收到以下错误消息:
Traceback (most recent call last):
File "/Users/rcastong/Desktop/imgs/try-creating-object-url.py", line 61, in <module>
driver4.get(urlworks2)
File "/Users/rcastong/Library/Python/3.9/lib/python/site-packages/selenium/webdriver/remote/webdriver.py", line 333, in get
self.execute(Command.GET, {'url': url})
File "/Users/rcastong/Library/Python/3.9/lib/python/site-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute
self.error_handler.check_response(response)
File "/Users/rcastong/Library/Python/3.9/lib/python/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.InvalidArgumentException: Message: invalid argument
(Session info: chrome=98.0.4758.109)
这是部分代码
#this part works well
for number, item in enumerate(imgs2, 1):
# print('---', number, '---')
img_url = item.get_attribute("href")
if not img_url:
print("none")
else:
print('"'+img_url+'",')
# the error happens on driver4.get(urlworks2)
for i in range(0,30):
urlworks = img_url[i]
urlworks2 = urlworks.encode('ascii', 'ignore').decode('unicode_escape')
driver4 = webdriver.Chrome()
driver4.get(urlworks2)
def check_exists_by_xpath(xpath):
try:
WebDriverWait(driver3,55).until(EC.presence_of_all_elements_located((By.XPATH, xpath)))
except TimeoutException:
return False
return True
imgsrc2 = WebDriverWait(driver3,55).until(EC.presence_of_all_elements_located((By.XPATH, "//p[@data-testid='artistName']/ancestor::a[contains(@class,'ChildrenLink')]")))
for number, item in enumerate(imgsrc2, 1):
# print('---', number, '---')
artisturls = item.get_attribute("href")
if not artisturls:
print("none")
else:
print('"'+artisturls+'",')
这个错误信息...
Traceback (most recent call last):
.
driver4.get(urlworks2)
.
self.execute(Command.GET, {'url': url})
.
self.error_handler.check_response(response)
.
selenium.common.exceptions.InvalidArgumentException: Message: invalid argument
(Session info: chrome=98.0.4758.109)
...表示作为参数传递给 get()
的 url
是无效参数。
深入探讨
在第一个 for
循环中 item.get_attribute("href")
returns 一个 url 字符串和 img_url
在每次迭代时得到更新。所以实际上 img_url 仍然是一个字符串而不是 列表 url正如你假设的那样。因此,在第二个 for
循环中,当您尝试遍历字符串的元素并将它们传递给 get()
时,您会看到错误 InvalidArgumentException: Message: invalid argument
.
恶魔启动
例如下面的代码行:
img_url = 'https://www.google.com/'
for i in range(0,5):
urlworks = img_url[i]
urlworks2 = urlworks.encode('ascii', 'ignore').decode('unicode_escape')
print(urlworks2)
打印:
h
t
t
p
s
解决方案
在全局范围内声明一个空列表 img_url
并继续将 hrefs 附加到列表中,以便稍后迭代列表。
img_url = []
for number, item in enumerate(imgs2, 1):
img_url.append(item.get_attribute("href"))
参考
您可以在以下位置找到一些相关的详细讨论:
我正在抓取一个页面以获取 URL,然后使用它们抓取大量信息。我想避免一直复制和粘贴,但我找不到如何使 get() 与对象一起工作。我的代码的第一部分运行良好,但是当我到达试图获取 url 的部分时,我收到以下错误消息:
Traceback (most recent call last):
File "/Users/rcastong/Desktop/imgs/try-creating-object-url.py", line 61, in <module>
driver4.get(urlworks2)
File "/Users/rcastong/Library/Python/3.9/lib/python/site-packages/selenium/webdriver/remote/webdriver.py", line 333, in get
self.execute(Command.GET, {'url': url})
File "/Users/rcastong/Library/Python/3.9/lib/python/site-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute
self.error_handler.check_response(response)
File "/Users/rcastong/Library/Python/3.9/lib/python/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.InvalidArgumentException: Message: invalid argument
(Session info: chrome=98.0.4758.109)
这是部分代码
#this part works well
for number, item in enumerate(imgs2, 1):
# print('---', number, '---')
img_url = item.get_attribute("href")
if not img_url:
print("none")
else:
print('"'+img_url+'",')
# the error happens on driver4.get(urlworks2)
for i in range(0,30):
urlworks = img_url[i]
urlworks2 = urlworks.encode('ascii', 'ignore').decode('unicode_escape')
driver4 = webdriver.Chrome()
driver4.get(urlworks2)
def check_exists_by_xpath(xpath):
try:
WebDriverWait(driver3,55).until(EC.presence_of_all_elements_located((By.XPATH, xpath)))
except TimeoutException:
return False
return True
imgsrc2 = WebDriverWait(driver3,55).until(EC.presence_of_all_elements_located((By.XPATH, "//p[@data-testid='artistName']/ancestor::a[contains(@class,'ChildrenLink')]")))
for number, item in enumerate(imgsrc2, 1):
# print('---', number, '---')
artisturls = item.get_attribute("href")
if not artisturls:
print("none")
else:
print('"'+artisturls+'",')
这个错误信息...
Traceback (most recent call last):
.
driver4.get(urlworks2)
.
self.execute(Command.GET, {'url': url})
.
self.error_handler.check_response(response)
.
selenium.common.exceptions.InvalidArgumentException: Message: invalid argument
(Session info: chrome=98.0.4758.109)
...表示作为参数传递给 get()
的 url
是无效参数。
深入探讨
在第一个 for
循环中 item.get_attribute("href")
returns 一个 url 字符串和 img_url
在每次迭代时得到更新。所以实际上 img_url 仍然是一个字符串而不是 列表 url正如你假设的那样。因此,在第二个 for
循环中,当您尝试遍历字符串的元素并将它们传递给 get()
时,您会看到错误 InvalidArgumentException: Message: invalid argument
.
恶魔启动
例如下面的代码行:
img_url = 'https://www.google.com/'
for i in range(0,5):
urlworks = img_url[i]
urlworks2 = urlworks.encode('ascii', 'ignore').decode('unicode_escape')
print(urlworks2)
打印:
h
t
t
p
s
解决方案
在全局范围内声明一个空列表 img_url
并继续将 hrefs 附加到列表中,以便稍后迭代列表。
img_url = []
for number, item in enumerate(imgs2, 1):
img_url.append(item.get_attribute("href"))
参考
您可以在以下位置找到一些相关的详细讨论: