Python Web Scraping error - Reading from JSON- IndexError: list index out of range - how do I ignore
Python Web Scraping error - Reading from JSON- IndexError: list index out of range - how do I ignore
我正在通过 Python\Selenium\Chrome 无头驱动程序执行网络抓取。我正在阅读 JSON 的结果 - 这是我的代码:
CustId=500
while (CustId<=510):
print(CustId)
# Part 1: Customer REST call:
urlg = f'https://mywebsite/customerRest/show/?id={CustId}'
driver.get(urlg)
soup = BeautifulSoup(driver.page_source,"lxml")
dict_from_json = json.loads(soup.find("body").text)
# print(dict_from_json)
#try:
CustID = (dict_from_json['customerAddressCreateCommand']['customerId'])
# Addr = (dict_from_json['customerShowCommand']['customerAddressShowCommandSet'][0]['addressDisplayName'])
writefunction()
CustId = CustId+1
问题是有时 'addressDisplayName' 会出现在结果集中,有时不会。如果不是,它会出错并显示错误:
IndexError: list index out of range
这是有道理的,因为它不存在。我如何忽略它 - 所以如果 'addressDisplayName' 不存在就继续循环?我试过使用 TRY 但代码仍然停止执行。
如果您收到 IndexError(索引为“0”),则表示您的列表为空。所以这是前面路径中的一步(否则如果 'addressDisplayName' 从字典中丢失,你会得到一个 KeyError)。
您可以检查列表是否包含元素:
if dict_from_json['customerShowCommand']['customerAddressShowCommandSet']:
# get the data
否则你确实可以使用 try..except:
try:
# get the data
except IndexError, KeyError:
# handle missing data
try..except 块应该可以解决您的问题。
CustId=500
while (CustId<=510):
print(CustId)
# Part 1: Customer REST call:
urlg = f'https://mywebsite/customerRest/show/?id={CustId}'
driver.get(urlg)
soup = BeautifulSoup(driver.page_source,"lxml")
dict_from_json = json.loads(soup.find("body").text)
# print(dict_from_json)
CustID = (dict_from_json['customerAddressCreateCommand']['customerId'])
try:
Addr = (dict_from_json['customerShowCommand']['customerAddressShowCommandSet'][0]'addressDisplayName'])
except:
Addr ="NaN"
CustId = CustId+1
我正在通过 Python\Selenium\Chrome 无头驱动程序执行网络抓取。我正在阅读 JSON 的结果 - 这是我的代码:
CustId=500
while (CustId<=510):
print(CustId)
# Part 1: Customer REST call:
urlg = f'https://mywebsite/customerRest/show/?id={CustId}'
driver.get(urlg)
soup = BeautifulSoup(driver.page_source,"lxml")
dict_from_json = json.loads(soup.find("body").text)
# print(dict_from_json)
#try:
CustID = (dict_from_json['customerAddressCreateCommand']['customerId'])
# Addr = (dict_from_json['customerShowCommand']['customerAddressShowCommandSet'][0]['addressDisplayName'])
writefunction()
CustId = CustId+1
问题是有时 'addressDisplayName' 会出现在结果集中,有时不会。如果不是,它会出错并显示错误:
IndexError: list index out of range
这是有道理的,因为它不存在。我如何忽略它 - 所以如果 'addressDisplayName' 不存在就继续循环?我试过使用 TRY 但代码仍然停止执行。
如果您收到 IndexError(索引为“0”),则表示您的列表为空。所以这是前面路径中的一步(否则如果 'addressDisplayName' 从字典中丢失,你会得到一个 KeyError)。
您可以检查列表是否包含元素:
if dict_from_json['customerShowCommand']['customerAddressShowCommandSet']:
# get the data
否则你确实可以使用 try..except:
try:
# get the data
except IndexError, KeyError:
# handle missing data
try..except 块应该可以解决您的问题。
CustId=500
while (CustId<=510):
print(CustId)
# Part 1: Customer REST call:
urlg = f'https://mywebsite/customerRest/show/?id={CustId}'
driver.get(urlg)
soup = BeautifulSoup(driver.page_source,"lxml")
dict_from_json = json.loads(soup.find("body").text)
# print(dict_from_json)
CustID = (dict_from_json['customerAddressCreateCommand']['customerId'])
try:
Addr = (dict_from_json['customerShowCommand']['customerAddressShowCommandSet'][0]'addressDisplayName'])
except:
Addr ="NaN"
CustId = CustId+1