使用请求时变量中的错误处理
Error handling in variables when using Requests
在我设置 x=37
之前,我有下面的代码可以完全运行。此时,我收到错误
TypeError: 'NoneType' object is not subscriptable on the variable t["vintage"]["wine"]["region"]["country"]["name"].
我添加了另一个变量,几乎每次都会发生同样的问题,因此您可能会在那里发现错误。
我认为这是因为该页面上的 25 个结果之一没有分配国家/地区名称,因此变量给出了错误。
我想我需要为每个变量添加一个例外来处理这种情况。我已经看到添加这些的示例,除了它们似乎处于请求级别而不是找到合法页面而不是变量之一,我找不到在变量级别添加它们的指导。
# Import packages
import requests
import json
import pandas as pd
import time
x=37
# Get request from the Vivino website
r = requests.get(
"https://www.vivino.com/api/explore/explore",
params={
#"country_code": "FR",
#"country_codes[]":"pt",
"currency_code":"GBP",
"grape_filter":"varietal",
"min_rating":"1",
"order_by":"price",
"order":"asc",
"page": x,
"price_range_max":"100",
"price_range_min":"25",
"wine_type_ids[]":"1"
},
headers= {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:66.0) Gecko/20100101 Firefox/66.0"
},
)
# Variables to scrape from the Vivino website
results = [
(
t["vintage"]["wine"]["winery"]["name"],
t["vintage"]["year"],
t["vintage"]["wine"]["id"],
t["vintage"]["wine"]["name"],
t["vintage"]["statistics"]["ratings_average"],
t["prices"][0]["amount"],
t["vintage"]["wine"]["region"]["country"]["name"],
t["vintage"]["wine"]["region"]["country"]["code"],
t["vintage"]["wine"]["region"]["name"],
t["vintage"]["wine"]["style"]["name"]
)
for t in r.json()["explore_vintage"]["matches"]
]
# Saving the results in a dataframe
dataframe = pd.DataFrame(
results,
columns=["Winery", "Vintage", "Wine ID", "Wine", "Rating", "Price", "Country", "CountryCode", "Region", "Style"]
)
#output the dataframe
df_out = dataframe
df_out.to_csv("data.csv", index=False)
print("Complete -",x,"iterations")
问题是在深度嵌套的字典中某些键随机丢失(用 None 标记)。展示斗争的示例字典:
data = [
{'k1': {'k2': {'k3': 'value_i_want'}}},
{'k1': {'k2': None}},
{'k1': {'k2': {'k3': 'value_i_want'}}},
]
当您假设键 k3
肯定存在于数组中的每个字典中时,它并不存在。因此当你尝试做类似
的事情时
result = [t['k1']['k2']['k3'] for t in data]
你得到TypeError: 'NoneType' object is not subscriptable
.
TypeError
当 t['k1']['k2']
在 for-loop 下的第二次迭代中计算为 None
时出现,并且您试图在其中查找键。您基本上是在要求程序执行 None['k3']
,这解释了您收到的错误消息。
要解决此问题(这在 returned 来自 API 请求的数据中很常见),您需要 try-catch 该块。您可能会发现这个辅助函数很有用:
def try_to_get(d: dict, *args, default=None):
try:
for k in args:
d = d[k]
return d
except (KeyError, TypeError) as _:
print(f'Cannot find the key {args}')
return default
使用辅助函数,我们可以编写try_to_get(t, 'k1, 'k2', 'k3)
。虽然 non-problematic 字典会遍历嵌套并获得您想要的值,但有问题的字典会触发异常块,并且 return 会在出现错误时返回默认值(此处,默认值为 None).
您可以尝试将代码中的列表理解部分替换为:
results = [
(
try_to_get(t, "vintage", "wine", "winery", "name"),
try_to_get(t, "vintage", "year"),
try_to_get(t, "vintage", "wine", "id"),
try_to_get(t, "vintage", "wine", "name"),
try_to_get(t, "vintage", "statistics", "ratings_average"),
try_to_get(t, "prices", 0, "amount"),
try_to_get(t, "vintage", "wine", "region", "country", "name"),
try_to_get(t, "vintage", "wine", "region", "country", "code"),
try_to_get(t, "vintage", "wine", "region", "name"),
try_to_get(t, "vintage", "wine", "style", "name"),
)
for t in r.json()["explore_vintage"]["matches"]
]
在我设置 x=37
之前,我有下面的代码可以完全运行。此时,我收到错误
TypeError: 'NoneType' object is not subscriptable on the variable t["vintage"]["wine"]["region"]["country"]["name"].
我添加了另一个变量,几乎每次都会发生同样的问题,因此您可能会在那里发现错误。
我认为这是因为该页面上的 25 个结果之一没有分配国家/地区名称,因此变量给出了错误。
我想我需要为每个变量添加一个例外来处理这种情况。我已经看到添加这些的示例,除了它们似乎处于请求级别而不是找到合法页面而不是变量之一,我找不到在变量级别添加它们的指导。
# Import packages
import requests
import json
import pandas as pd
import time
x=37
# Get request from the Vivino website
r = requests.get(
"https://www.vivino.com/api/explore/explore",
params={
#"country_code": "FR",
#"country_codes[]":"pt",
"currency_code":"GBP",
"grape_filter":"varietal",
"min_rating":"1",
"order_by":"price",
"order":"asc",
"page": x,
"price_range_max":"100",
"price_range_min":"25",
"wine_type_ids[]":"1"
},
headers= {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:66.0) Gecko/20100101 Firefox/66.0"
},
)
# Variables to scrape from the Vivino website
results = [
(
t["vintage"]["wine"]["winery"]["name"],
t["vintage"]["year"],
t["vintage"]["wine"]["id"],
t["vintage"]["wine"]["name"],
t["vintage"]["statistics"]["ratings_average"],
t["prices"][0]["amount"],
t["vintage"]["wine"]["region"]["country"]["name"],
t["vintage"]["wine"]["region"]["country"]["code"],
t["vintage"]["wine"]["region"]["name"],
t["vintage"]["wine"]["style"]["name"]
)
for t in r.json()["explore_vintage"]["matches"]
]
# Saving the results in a dataframe
dataframe = pd.DataFrame(
results,
columns=["Winery", "Vintage", "Wine ID", "Wine", "Rating", "Price", "Country", "CountryCode", "Region", "Style"]
)
#output the dataframe
df_out = dataframe
df_out.to_csv("data.csv", index=False)
print("Complete -",x,"iterations")
问题是在深度嵌套的字典中某些键随机丢失(用 None 标记)。展示斗争的示例字典:
data = [
{'k1': {'k2': {'k3': 'value_i_want'}}},
{'k1': {'k2': None}},
{'k1': {'k2': {'k3': 'value_i_want'}}},
]
当您假设键 k3
肯定存在于数组中的每个字典中时,它并不存在。因此当你尝试做类似
result = [t['k1']['k2']['k3'] for t in data]
你得到TypeError: 'NoneType' object is not subscriptable
.
TypeError
当 t['k1']['k2']
在 for-loop 下的第二次迭代中计算为 None
时出现,并且您试图在其中查找键。您基本上是在要求程序执行 None['k3']
,这解释了您收到的错误消息。
要解决此问题(这在 returned 来自 API 请求的数据中很常见),您需要 try-catch 该块。您可能会发现这个辅助函数很有用:
def try_to_get(d: dict, *args, default=None):
try:
for k in args:
d = d[k]
return d
except (KeyError, TypeError) as _:
print(f'Cannot find the key {args}')
return default
使用辅助函数,我们可以编写try_to_get(t, 'k1, 'k2', 'k3)
。虽然 non-problematic 字典会遍历嵌套并获得您想要的值,但有问题的字典会触发异常块,并且 return 会在出现错误时返回默认值(此处,默认值为 None).
您可以尝试将代码中的列表理解部分替换为:
results = [
(
try_to_get(t, "vintage", "wine", "winery", "name"),
try_to_get(t, "vintage", "year"),
try_to_get(t, "vintage", "wine", "id"),
try_to_get(t, "vintage", "wine", "name"),
try_to_get(t, "vintage", "statistics", "ratings_average"),
try_to_get(t, "prices", 0, "amount"),
try_to_get(t, "vintage", "wine", "region", "country", "name"),
try_to_get(t, "vintage", "wine", "region", "country", "code"),
try_to_get(t, "vintage", "wine", "region", "name"),
try_to_get(t, "vintage", "wine", "style", "name"),
)
for t in r.json()["explore_vintage"]["matches"]
]