使用请求时变量中的错误处理

Question

在我设置 x=37 之前，我有下面的代码可以完全运行。此时，我收到错误

TypeError: 'NoneType' object is not subscriptable on the variable t["vintage"]["wine"]["region"]["country"]["name"].

我添加了另一个变量，几乎每次都会发生同样的问题，因此您可能会在那里发现错误。

我认为这是因为该页面上的 25 个结果之一没有分配国家/地区名称，因此变量给出了错误。

我想我需要为每个变量添加一个例外来处理这种情况。我已经看到添加这些的示例，除了它们似乎处于请求级别而不是找到合法页面而不是变量之一，我找不到在变量级别添加它们的指导。

# Import packages
import requests
import json
import pandas as pd
import time

x=37

# Get request from the Vivino website
r = requests.get(
    "https://www.vivino.com/api/explore/explore",
    params={
        #"country_code": "FR",
        #"country_codes[]":"pt",
        "currency_code":"GBP",
        "grape_filter":"varietal",
        "min_rating":"1",
        "order_by":"price",
        "order":"asc",
        "page": x,
        "price_range_max":"100",
        "price_range_min":"25",
        "wine_type_ids[]":"1"
},
headers= {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:66.0) Gecko/20100101 Firefox/66.0"
    },
)

# Variables to scrape from the Vivino website
results = [
    (
        t["vintage"]["wine"]["winery"]["name"],
        t["vintage"]["year"],
        t["vintage"]["wine"]["id"],
        t["vintage"]["wine"]["name"],
        t["vintage"]["statistics"]["ratings_average"],
        t["prices"][0]["amount"],
        t["vintage"]["wine"]["region"]["country"]["name"],
        t["vintage"]["wine"]["region"]["country"]["code"],
        t["vintage"]["wine"]["region"]["name"],
        t["vintage"]["wine"]["style"]["name"]
    )
    for t in r.json()["explore_vintage"]["matches"]
]

# Saving the results in a dataframe
dataframe = pd.DataFrame(
    results,
    columns=["Winery", "Vintage", "Wine ID", "Wine", "Rating", "Price", "Country", "CountryCode", "Region", "Style"]
)
    
#output the dataframe
df_out = dataframe
df_out.to_csv("data.csv", index=False)
print("Complete -",x,"iterations")

Answer 1

问题是在深度嵌套的字典中某些键随机丢失（用 None 标记）。展示斗争的示例字典：

data = [
  {'k1': {'k2': {'k3': 'value_i_want'}}},
  {'k1': {'k2': None}},
  {'k1': {'k2': {'k3': 'value_i_want'}}},
]

当您假设键 k3 肯定存在于数组中的每个字典中时，它并不存在。因此当你尝试做类似

的事情时

result = [t['k1']['k2']['k3'] for t in data]

你得到TypeError: 'NoneType' object is not subscriptable.

TypeError 当 t['k1']['k2'] 在 for-loop 下的第二次迭代中计算为 None 时出现，并且您试图在其中查找键。您基本上是在要求程序执行 None['k3']，这解释了您收到的错误消息。

要解决此问题（这在 returned 来自 API 请求的数据中很常见），您需要 try-catch 该块。您可能会发现这个辅助函数很有用：

def try_to_get(d: dict, *args, default=None):
    try:
        for k in args:
            d = d[k]
        return d
    except (KeyError, TypeError) as _:
        print(f'Cannot find the key {args}')
        return default

使用辅助函数，我们可以编写try_to_get(t, 'k1, 'k2', 'k3)。虽然 non-problematic 字典会遍历嵌套并获得您想要的值，但有问题的字典会触发异常块，并且 return 会在出现错误时返回默认值（此处，默认值为 None).

您可以尝试将代码中的列表理解部分替换为：

results = [
    (
        try_to_get(t, "vintage", "wine", "winery", "name"),
        try_to_get(t, "vintage", "year"),
        try_to_get(t, "vintage", "wine", "id"),
        try_to_get(t, "vintage", "wine", "name"),
        try_to_get(t, "vintage", "statistics", "ratings_average"),
        try_to_get(t, "prices", 0, "amount"),
        try_to_get(t, "vintage", "wine", "region", "country", "name"),
        try_to_get(t, "vintage", "wine", "region", "country", "code"),
        try_to_get(t, "vintage", "wine", "region", "name"),
        try_to_get(t, "vintage", "wine", "style", "name"),
    )
    for t in r.json()["explore_vintage"]["matches"]
]

使用请求时变量中的错误处理

Error handling in variables when using Requests

python

web-scraping

python-requests