使用 Watson SDK 的主题建模示例 API

Question

我有一份将在不同时间点在线上传的文件清单。我没有关于内容的任何先验信息，我也没有关于可以分配给文档的可能标签的任何信息，我也没有任何历史数据（因此我无法用 Watson 训练分类器自然语言分类器服务）。我想要的是对这些文档进行一些实时分类/主题分配。例如，像下面这样的一些 API 是我正在搜索的内容：

service.getTopics('some text')

实时返回如下内容

"categories": [
          {
            "score": 0.949576,
            "label": "/technology and computing/networking"
          },
          {
            "score": 0.911692,
            "label": "/technology and computing/networking/network monitoring and management"
          },
          {
            "score": 0.879639,
            "label": "/business and industrial/business operations/management"
          }
]

Watson 发现或 NLU 服务是否可行？我正在使用 python SDK APIs，一个示例/任何相关的 link 将非常有帮助。谢谢

Answer 1

我认为 Watson Natural Language Understanding 服务的 categories 或 concepts 功能最适合。您无法使用 API 直接发送文档，因此您需要提取文本，但如果您能够这样做，则：

示例摘自 API 文档页面


from ibm_watson import NaturalLanguageUnderstandingV1
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator
from ibm_watson.natural_language_understanding_v1 
    import Features, ConceptsOptions, CategoriesOptions

authenticator = IAMAuthenticator('{apikey}')
natural_language_understanding = NaturalLanguageUnderstandingV1(
    version='2019-07-12',
    authenticator=authenticator)

natural_language_understanding.set_service_url('{url}')

response = natural_language_understanding.analyze(
    text='IBM is an American multinational technology company '
    'headquartered in Armonk, New York, United States, '
    'with operations in over 170 countries.',
    features=Features(
        categories=CategoriesOptions(limit=5),
        concepts=ConceptsOptions(limit=5))).get_result()

更多信息在 API 文档中 - https://cloud.ibm.com/apidocs/natural-language-understanding/natural-language-understanding?code=python#categories

使用 Watson SDK 的主题建模示例 API

Topic modeling example with Watson SDK API

python

nlp

machine-learning

topic-modeling

ibm-watson