如何减少 JSON 中的请求时间或用默认键替换字典键？

Question

我有一个字典列表，我在搜索 JSON url 时正在填写它。问题是 JSON（由 Google 书籍 API 提供）并不总是完整的。这是对书籍的搜索，据我所见，所有书籍都有 id、标题和作者，但并非所有书籍都有 imageLinks。这里以 JSON link 为例：Search for Harry Potter.

注意它总是 returns 10 个结果，在此示例中有 10 个 ID、10 个标题、10 个作者，但只有 4 个图像链接。

@app.route('/search', methods=["GET", "POST"])
@login_required
def search():
    if request.method == "POST":
        while True:
            try:
                seek = request.form.get("seek")
                url = f'https://www.googleapis.com/books/v1/volumes?q={seek}'
                response = requests.get(url)
                response.raise_for_status()
                search = response.json()
                seek = search['items']
                infobooks = []
                for i in range(len(seek)):
                    infobooks.append({
                        "book_id": seek[i]['id'],
                        "thumbnail": seek[i]['volumeInfo']['imageLinks']['thumbnail'],
                        "title": seek[i]['volumeInfo']['title'],
                        "authors": seek[i]['volumeInfo']['authors']
                    })
                return render_template("index.html", infobooks=infobooks)
            except (requests.RequestException, KeyError, TypeError, ValueError):
                continue
    else:
        return render_template("index.html")

我使用的方法和上面演示的方法，我可以找到10个imageLinks（缩略图）但是需要很长时间！任何人对此请求有什么建议不要花这么长时间？或者当我找不到图像链接时，我可以通过某种方式插入“没有封面的书”图像？（不是我想要的，但比等待结果要好）

Answer 1

从你的问题来看，问题是什么并不是很明显（因此缺乏参与）。在研究了代码和 API 之后，我现在对这个问题有了更好的理解。

问题是 Google 本书 API 并不总是包含每个项目的图像缩略图。

您当前针对此问题的解决方案是重试整个搜索，直到所有字段都有图像缩略图。但是想想是否真的需要这样做。也许你可以把它分开。在我的测试中，我发现没有图像缩略图的书籍经常会切换。这意味着如果您一直重试直到查询的所有结果都有缩略图，这将花费很长时间。

该解决方案应尝试单独查询每本书的缩略图。经过 X 次尝试后，它应该默认为 'image available'，以避免向 API.

发送垃圾邮件

正如您在 post 中所了解的那样，您可以从原始搜索查询中获取每本书的卷 ID。然后，您可以使用 this API 调用分别查询每个卷。

我已经创建了一些代码来验证它是否有效。而且只有一本书最后没有图像缩略图。这段代码还有很大的改进空间，但我会把它留给你作为练习。

import requests

# Max attempts to get an image
_MAX_ATTEMPTS = 5

# No Image Picture
no_img_link = 'https://upload.wikimedia.org/wikipedia/en/6/60/No_Picture.jpg'


def search_book(seek):
    url = f'https://www.googleapis.com/books/v1/volumes?q={seek}'
    response = requests.get(url)
    search = response.json()
    volumes = search['items']

    # Get ID's of all the volumes
    volume_ids = [volume['id'] for volume in volumes]

    # Storage for the results
    book_info_collection = []

    # Loop over all the volume ids
    for volume_id in volume_ids:

        # Attempt to get the thumbnail a couple times
        for i in range(_MAX_ATTEMPTS):
            url = f'https://www.googleapis.com/books/v1/volumes/{volume_id}'
            response = requests.get(url)
            volume = response.json()
            try:
                thumbnail = volume['volumeInfo']['imageLinks']['thumbnail']
            except KeyError:
                print(f'Failed for {volume_id}')
                if i < _MAX_ATTEMPTS - 1:
                    # We still have attempts left, keep going
                    continue
                # Failed on the last attempt, use a default image
                thumbnail = no_img_link
                print('Using Default')

            # Create dict with book info
            book_info = {
                "book_id": volume_id,
                "thumbnail": thumbnail,
                "title": volume['volumeInfo']['title'],
                "authors": volume['volumeInfo']['authors']
            }

            # Add to collection
            book_info_collection.append(book_info)
            break

    return book_info_collection


books = search_book('Harry Potter')
print(books)

Answer 2

您已补充说您希望它能够快速加载。这意味着您不能在 python 中进行重试，因为您在 python 中进行的任何重试都将意味着更长的页面加载时间。

这意味着您必须在浏览器中进行加载。您可以使用与纯 python 方法相同的方法。首先，您只需使用请求中的所有图像，并对所有没有图像的卷发出额外请求。这意味着您有 2 个端点，一个用于 volume_information。另一个端点只获取一个卷的数据。

请注意，我使用的术语是卷而不是书，因为 Google API 也使用该术语。

现在，JavaScript 不是我的强项，所以我在这里提供的解决方案还有很大的改进空间。

我用烧瓶做了这个例子。此示例应该可以帮助您实施适合您特定应用的解决方案。

额外注意：在我的测试中，我注意到，一些区域比其他区域更经常响应所有缩略图。 The API sends different responses based on your IP address。如果我将我的 IP 设置在美国，我通常会在不重试的情况下获得所有缩略图。我正在使用 VPN 来执行此操作，但可能还有其他解决方案。

app.py

import time

from flask import Flask, render_template, request, jsonify
import requests

app = Flask(__name__)


@app.route('/')
def landing():
    return render_template('index.html', volumes=get_volumes('Harry Potter'))


@app.route('/get_volume_info')
def get_volume_info_endpoint():
    volume_id = request.args.get('volume_id')
    if volume_id is None:
        # Return an error if no volume id was provided
        return jsonify({'error': 'must provide argument'}), 400

    # To stop spamming the API
    time.sleep(0.250)
    
    # Request volume data
    url = f'https://www.googleapis.com/books/v1/volumes/{volume_id}'
    response = requests.get(url)
    volume = response.json()

    # Get the info using the helper function
    volume_info = get_volume_info(volume, volume_id)
    
    # Return json object with the info
    return jsonify(volume_info), 200


def get_volumes(search):
    # Make request
    url = f'https://www.googleapis.com/books/v1/volumes?q={search}'
    response = requests.get(url)
    data = response.json()

    # Get the volumes
    volumes = data['items']

    # Add list to store results
    volume_info_collection = []

    # Loop over the volumes
    for volume in volumes:
        volume_id = volume['id']
        
        # Get volume info using helper function
        volume_info = get_volume_info(volume, volume_id)

        # Add it to the result
        volume_info_collection.append(volume_info)
    
    return volume_info_collection


def get_volume_info(volume, volume_id):
    # Get basic information
    volume_title = volume['volumeInfo']['title']
    volume_authors = volume['volumeInfo']['authors']
    
    # Set default value for thumbnail
    volume_thumbnail = None
    try:
        volume_thumbnail = volume['volumeInfo']['imageLinks']['thumbnail']
    except KeyError:
        # Failed we keep the None value
        print('Failed to get thumbnail')
    
    # Fill in the dict
    volume_info = {
        'volume_id': volume_id,
        'volume_title': volume_title,
        'volume_authors': volume_authors,
        'volume_thumbnail': volume_thumbnail
    }
    
    # Return volume info
    return volume_info


if __name__ == '__main__':
    app.run()

模板index.html

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>Title</title>
    <script>
        let tracker = {}

        function get_thumbnail(id) {
            let url = '/get_volume_info?volume_id=' + id
            fetch(url).then(function (response) {
                return response.json();
            }).then(function (data) {
                console.log(data);
                return data['volume_thumbnail']
            }).catch(function () {
                console.log("Error");
            });
        }

        function image_load_failed(id) {
            let element = document.getElementById(id)

            if (isNaN(tracker[id])) {
                tracker[id] = 0
            }
            console.log(tracker[id])

            if (tracker[id] >= 5) {
                element.src = 'https://via.placeholder.com/128x196C/O%20https://placeholder.com/'
                return
            }

            element.src = get_thumbnail(id)
            tracker[id]++
        }
    </script>
</head>
<body>

<table>
    <tr>
        <th>ID</th>
        <th>Title</th>
        <th>Authors</th>
        <th>Thumbnail</th>
    </tr>
    {% for volume in volumes %}
        <tr>
            <td>{{ volume['volume_id'] }}</td>
            <td>{{ volume['volume_title'] }}</td>
            <td>{{ volume['volume_authors'] }}</td>
            <td><img id="{{ volume['volume_id'] }}" src="{{ volume['volume_thumbnail'] }}"
                     onerror="image_load_failed('{{ volume['volume_id'] }}')"></td>
        </tr>
    {% endfor %}

</table>

</body>
</html>

Answer 3

首先，您的函数永远不会产生 10 个图像链接，因为 api 将始终 return 产生相同的结果。因此，如果您第一次检索到 4 个 imageLinks，第二次将相同。除非 google 更新数据集，但那是你无法控制的。

Google 图书 Api 允许最多 40 个结果，默认为最多 10 个结果。要增加，您可以添加查询参数 maxResults=40，其中 40 可以是等于或小于 40 的任何所需数字。然后您可以在此处决定以编程方式过滤掉所有没有 imageLinks 的结果，或者保留它们并添加无结果图片 url 给他们。也不是每个结果 return 都是作者列表，这在这个例子中也已修复。不要冒险使用第三方 api 始终检查 empty/null 结果，因为它可能会破坏您的代码。我已经使用 .get 来避免在处理 json.

时发生任何异常

虽然我没有在这个例子中添加它，但您也可以使用 google 书籍提供的分页来分页以获得更多结果。

示例：

@app.route('/search', methods=["GET", "POST"])
@login_required
def search():
    if request.method == "POST":
        seek = request.form.get("seek")
        url = f'https://www.googleapis.com/books/v1/volumes?q={seek}&maxResults=40'
        response = requests.get(url)
        response.raise_for_status()
        results = response.json().get('items', [])
        infobooks = []
        no_image = {'smallThumbnail': 'http://no-image-link/image-small.jpeg', 'thumbnail': 'http://no-image-link/image.jpeg'}
        for result in results:
            info = result.get('volumeInfo', {})
            imageLinks = info.get("imageLinks")
            infobooks.append({
                "book_id": result.get('id'),
                "thumbnail": imageLinks if imageLinks else no_image,
                "title": info.get('title'),
                "authors": info.get('authors')
            })
        return render_template("index.html", infobooks=infobooks)
    else:
        return render_template("index.html")

Google 书籍 Api 文档： https://developers.google.com/books/docs/v1/using

Answer 4

添加虚拟图片URL

"book_id": seek[i]['id'] or 'dummy_url'

如何减少 JSON 中的请求时间或用默认键替换字典键？

How to reduce request time in a JSON or replace a dictionary key with a default one?

python

google-api

flask

google-books-api