迭代 JSON 时出现错误,来自 IBM Watson API python

Getting error when iterating through JSON results from IBM Watson API python

我正在使用 IBM Watson 的自然语言理解 API 从 URL 中获取关键字和实体。我想遍历 JSON 响应以获取所有关键字和实体,并将它们填充到我的 results.html 文件中。

我正在尝试遍历 application.py 文件和使用 jinja 的 results.html 文件中的结果。

helpers.py 文件返回 json.dump 并将其发送到我的 application.py 文件,以便我可以遍历结果。

但是,我收到以下错误:

TypeError: string indices must be integers

我查阅了 json.dump 与 json.load 以及字符串和字典来帮助解决这个问题,但我无法让代码工作。如果您需要更多信息,请告诉我。我需要在年底前解决这个问题。先感谢您。

这是我的 applications.py 文件

@app.route("/URL", methods=["GET", "POST"])
def URL():
"""Analyze URL."""

 # if user reached route via POST (as by submitting a form via POST)
if request.method == "POST":

    # if nothing was entered return apology
    if not request.form.get("URL"):
        return apology("please enter a URL")
    URL = request.form.get("URL")

    # analyze URL using analyze function in helpers.py
    results = analyze(request.form.get("URL"))

    for item in results:
        keywords = item["keywords"]["text"]
        entities = item["entities"]["text"]

    return render_template("results.html", results=results, URL=URL) 

    # check if URL is valid
    if not results:
        return apology("this is not a valid URL")

else:
    return render_template("url.html")

这是 helpers.py 文件。

def analyze(URL):

natural_language_understanding = NaturalLanguageUnderstandingV1(
    version='2017-02-27',
    username='MUSTGETYOURUSERNAME',
    password='MUSTGETYOURPASSWORD')

response = natural_language_understanding.analyze(
    url=URL,
    features=Features(entities=EntitiesOptions(), keywords=KeywordsOptions()))

return (json.dumps(response, indent=2))

这是使用 jinja 的 results.html 文件:

{% extends "layout.html" %}

{% block title %}
Results
{% endblock %}

{% block main %}
        <h2>Powered by IBM Watson's AI to recommend your #'s and @'s 
for tweeting</h2>
        <p>{{URL}}</p>
         {% for item in results %}
            <tr>
                <td>{{ item.keywords }}</td>
                <td>{{ item.entities }}</td>
            </tr>
        {% endfor %}

        <a class="twitter-share-button" 
href="https://twitter.com/intent/tweet">Tweet</a>
{% endblock %}

输出结果如下:

[
  {
    "text": "Android apps",
    "relevance": 0.926516
  },
  {
    "text": "Chrome OS",
    "relevance": 0.878045
  },
  {
    "text": "Sorry Android fanboys",
    "relevance": 0.696885
  },
  {
    "text": "Android tablet",
    "relevance": 0.695471
  },
  {
    "text": "absolutely wonderful Android",
    "relevance": 0.672889
  },
  {
    "text": "Chrome OS beta",
    "relevance": 0.626619
  },
  {
    "text": "Android Police",
    "relevance": 0.592994
  },
  {
    "text": "Chrome OS devices",
    "relevance": 0.566831
  },
  {
    "text": "count Android",
    "relevance": 0.563911
  },
  {
    "text": "dominant Google OS",
    "relevance": 0.553724
  },
  {
    "text": "Chrome Unboxed",
    "relevance": 0.540076
  },
  {
    "text": "overall tablet sales",
    "relevance": 0.511826
  },
  {
    "text": "inexpensive Google rival",
    "relevance": 0.498259
  },
  {
    "text": "half incremental improvements",
    "relevance": 0.468663
  },
  {
    "text": "standard operating procedure",
    "relevance": 0.45946
  },
  {
    "text": "uncommon Chromebook form",
    "relevance": 0.456969
  },
  {
    "text": "content consumption machines",
    "relevance": 0.451775
  },
  {
    "text": "absolute best pieces",
    "relevance": 0.450763
  },
  {
    "text": "content creation ones",
    "relevance": 0.450345
  },
  {
    "text": "rich new fusion",
    "relevance": 0.446127
  },
  {
    "text": "Amazon Fire tablet",
    "relevance": 0.444685
  },
  {
    "text": "selling tablet",
    "relevance": 0.444241
  },
  {
    "text": "tablet operating",
    "relevance": 0.440434
  },
  {
    "text": "Google Pixelbook",
    "relevance": 0.440007
  },
  {
    "text": "Google store",
    "relevance": 0.439719
  },
  {
    "text": "cheap tablets",
    "relevance": 0.408395
  },
  {
    "text": "immortal highlander",
    "relevance": 0.404233
  },
  {
    "text": "disparate OSes",
    "relevance": 0.401626
  },
  {
    "text": "laptop space",
    "relevance": 0.40117
  },
  {
    "text": "detachable two-in-one",
    "relevance": 0.396257
  },
  {
    "text": "pleasant surprises",
    "relevance": 0.394027
  },
  {
    "text": "additional oomph",
    "relevance": 0.393127
  },
  {
    "text": "Samsung",
    "relevance": 0.391534
  },
  {
    "text": "flashy Chromebook",
    "relevance": 0.391359
  },
  {
    "text": "sleek Chromebook",
    "relevance": 0.390035
  },
  {
    "text": "smaller devices",
    "relevance": 0.389106
  },
  {
    "text": "operating systems",
    "relevance": 0.388958
  },
  {
    "text": "new feature",
    "relevance": 0.388395
  },
  {
    "text": "true multitasking",
    "relevance": 0.388097
  },
  {
    "text": "tablet-like device",
    "relevance": 0.387175
  },
  {
    "text": "two-in-one Chromebook",
    "relevance": 0.385518
  },
  {
    "text": "nightmare fuel",
    "relevance": 0.385284
  },
  {
    "text": "mouse-first OS—not",
    "relevance": 0.385193
  },
  {
    "text": "parallel tasks",
    "relevance": 0.381923
  },
  {
    "text": "budget device",
    "relevance": 0.380932
  },
  {
    "text": "iPad",
    "relevance": 0.35313
  },
  {
    "text": "news",
    "relevance": 0.333007
  },
  {
    "text": "strides",
    "relevance": 0.319667
  },
  {
    "text": "iOS",
    "relevance": 0.318235
  },
  {
    "text": "thanks",
    "relevance": 0.316534
  }
]

[
  {
    "type": "Company",
    "text": "Google",
    "relevance": 0.385564,
    "disambiguation": {
      "subtype": [
        "AcademicInstitution",
        "AwardPresentingOrganization",
        "OperatingSystemDeveloper",
        "ProgrammingLanguageDeveloper",
        "SoftwareDeveloper",
        "VentureFundedCompany"
      ],
      "name": "Google",
      "dbpedia_resource": "http://dbpedia.org/resource/Google"
    },
    "count": 9
  },
  {
    "type": "Company",
    "text": "Samsung",
    "relevance": 0.204475,
    "disambiguation": {
      "subtype": [],
      "name": "Samsung",
      "dbpedia_resource": "http://dbpedia.org/resource/Samsung"
    },
    "count": 4
  },
  {
    "type": "Location",
    "text": "Chromebooks",
    "relevance": 0.129986,
    "disambiguation": {
      "subtype": [
        "City"
      ]
    },
    "count": 2
  },
  {
    "type": "Company",
    "text": "Amazon",
    "relevance": 0.119948,
    "disambiguation": {
      "subtype": [],
      "name": "Amazon.com",
      "dbpedia_resource": "http://dbpedia.org/resource/Amazon.com"
    },
    "count": 2
  },
  {
    "type": "Location",
    "text": "US",
    "relevance": 0.109124,
    "disambiguation": {
      "subtype": [
        "Region",
        "AdministrativeDivision",
        "GovernmentalJurisdiction",
        "FilmEditor",
        "Country"
      ],
      "name": "United States",
      "dbpedia_resource": "http://dbpedia.org/resource/United_States"
    },
    "count": 1
  },
  {
    "type": "Company",
    "text": "Apple",
    "relevance": 0.108271,
    "disambiguation": {
      "subtype": [
        "Brand",
        "OperatingSystemDeveloper",
        "ProcessorManufacturer",
        "ProgrammingLanguageDesigner",
        "ProgrammingLanguageDeveloper",
        "ProtocolProvider",
        "SoftwareDeveloper",
        "VentureFundedCompany",
        "VideoGameDeveloper",
        "VideoGamePublisher"
      ],
      "name": "Apple Inc.",
      "dbpedia_resource": "http://dbpedia.org/resource/Apple_Inc."
    },
    "count": 1
  },
  {
    "type": "Quantity",
    "text": "0",
    "relevance": 0.0746897,
    "count": 1
  },
  {
    "type": "Quantity",
    "text": "",
    "relevance": 0.0746897,
    "count": 1
  }
]

这里有些混乱。您没有很好地处理 Watson API 响应,而且您似乎还误解了此响应返回的形状。

您的 analyze 函数处理 Watson API 调用的响应。 Watson API 有助于将从服务器返回的 JSON 响应解析为 Python 对象,例如列表和字典。但是,您的代码随后会调用 json.dumps 将其转换回字符串。通过这种方式,您正在撤销 Watson API 为您完成的一些工作。不要在 response 上调用 json.dumps,只需 return response

(我猜您是从 IBM 官方文档 here 中获取的:它对 json.dumps 的调用相同,但 Python 代码示例仅在用于演示目的。)

这解释了您遇到的错误:results 因此是一个字符串,因此当您使用 for item in results 对其进行迭代时,每个 item 都是一个 1 个字符的字符串。

但是,进行此更改不足以让您的代码正常工作。接下来,我们要查看错误发生的循环,因为它仍然存在问题:

    for item in results:
        keywords = item["keywords"]["text"]
        entities = item["entities"]["text"]

此代码将 results 视为列表,列表中的每个 item 都是包含 keywords 属性 和 entities 属性。换句话说,它假定数据如下所示:

[
    {
        "keywords": { "text": "abc123", ... },
        "entities": { "text": "def456", ... },
        ...
    },
    {
        "keywords": { "text": "cba321", ... },
        "entities": { "text": "fed654", ... },
        ...
    },
    ...
]

但是,如果您仔细查看 the IBM documentation(与上述链接相同的页面),响应会以不同的形式返回。它看起来更像这样:

{
    "keywords": [
        { "text": "abc123", ... },
        { "text": "def456", ... }
    ],
    "entities": [
        { "text": "ghi789", ... }
    ],
    ...
}

特别是,keywordsentities 是顶级 object/dict 下的单独列表,它们的长度可能不同。

而不是上面的循环,你可能想要更像下面的东西:

    for item in results["keywords"]:
        keyword_text = item["text"]

    for item in results["entities"]:
        entity_text = item["text"]

但是,我不确定您的原始循环应该做什么:它会从您的 Watson 响应中获取数据,然后不对它获取的数据执行任何操作。

您还需要修改 Jinja 模板以包含两个单独的循环。我会把这个留给你。

最后,你编写如下代码:

    return render_template("results.html", results=results, URL=URL) 

    # check if URL is valid
    if not results:
        return apology("this is not a valid URL")

在我们已经 return 之后检查 URL 是否有效已经太晚了!支票 (if not results) 无法访问并且永远不会 运行。将此检查移动到 results = analyze(...).

行之后