GCP 情感分析 returns 17 个不同文档的得分相同，我做错了什么？

Question

我是运行 Google Cloud Platform 对 17 个不同文档的情感分析，但它给了我相同的分数，每个分数不同。这是我第一次使用这个包，但据我所知，所有这些都具有完全相同的分数是不可能的。

这些文件是不同大小的 pdf 文件，但在 15-20 页之间，我排除了其中的 3 个，因为它们不相关。

我已经用其他文档尝试过该代码，它对较短的文档给出了不同的分数，我怀疑它可以处理的文档有最大长度，但在文档中或通过 [=74 找不到任何内容=].

def analyze(text):
    client = language.LanguageServiceClient(credentials=creds)    

    document = types.Document(content=text, 
        type=enums.Document.Type.PLAIN_TEXT)

    sentiment = client.analyze_sentiment(document=document).document_sentiment
    entities = client.analyze_entities(document=document).entities

    return sentiment, entities


def extract_text_from_pdf_pages(pdf_path):
    resource_manager = PDFResourceManager()
    fake_file_handle = io.StringIO()
    converter = TextConverter(resource_manager, fake_file_handle)
    page_interpreter = PDFPageInterpreter(resource_manager, converter)

    with open(pdf_path, 'rb') as fh:
        last_page = len(list(enumerate(PDFPage.get_pages(fh, caching=True, check_extractable=True))))-1

        for pgNum, page in enumerate(PDFPage.get_pages(fh, 
                                  caching=True,
                                  check_extractable=True)):

            if pgNum not in [0,1, last_page]:
                page_interpreter.process_page(page)

        text = fake_file_handle.getvalue()

    # close open handles
    converter.close()
    fake_file_handle.close()

    if text:
        return text

结果（分数，量级）：

文档 1 0.10000000149011612 - 147.5

文档2 0.10000000149011612 - 118.30000305175781

doc3 0.10000000149011612 - 144.0

doc4 0.10000000149011612 - 147.10000610351562

文档5 0.10000000149011612 - 131.39999389648438

doc6 0.10000000149011612 - 116.19999694824219

文档7 0.10000000149011612 - 121.0999984741211

doc8 0.10000000149011612 - 131.60000610351562

doc9 0.10000000149011612 - 97.69999694824219

doc10 0.10000000149011612 - 174.89999389648438

doc11 0.10000000149011612 - 138.8000030517578

doc12 0.10000000149011612 - 141.10000610351562

doc13 0.10000000149011612 - 118.5999984741211

doc14 0.10000000149011612 - 135.60000610351562

doc15 0.10000000149011612 - 127.0

doc16 0.10000000149011612 - 97.0999984741211

doc17 0.10000000149011612 - 183.5

所有文件的预期结果都不同，至少是小的变化。（与我在文档和其他地方找到的相比，我认为这些幅度分数也太高了）

Answer 1

是的，有一些quotas in the usage of the Natural Language API。

自然语言API 将文本处理成一系列标记，大致对应于单词边界。尝试处理超过令牌配额（默认情况下每个查询 100.000 个令牌）的令牌不会产生错误，但超过该配额的任何令牌都将被忽略。

对于第二个问题，我很难在无法访问文档的情况下评估自然语言 API 的结果。也许因为它们太中性，你得到的结果非常相似。我有运行一些带有大中性文本的测试，我得到了类似的结果。

澄清一下，as stated in the Natural Language API documentation：

documentSentiment contains the overall sentiment of the document, which consists of the following fields:

score of the sentiment ranges between -1.0 (negative) and 1.0 (positive) and corresponds to the overall emotional leaning of the text.

magnitude indicates the overall strength of emotion (both positive and negative) within the given text, between 0.0 and +inf. Unlike score, magnitude is not normalized; each expression of emotion within the text (both positive and negative) contributes to the text's magnitude (so longer text blocks may have greater magnitudes).

GCP 情感分析 returns 17 个不同文档的得分相同，我做错了什么？

GCP Sentiment Analysis returns same score for 17 different documents, what am I doing wrong?

python

sentiment-analysis

google-cloud-platform