为什么我在使用 pycorenlp.StanfordCoreNLP.annotate 时得到 String where should get a dict？

Question

我运行这个 example 使用 pycorenlp Stanford Core NLP python 包装器，但是注释函数 returns 是字符串而不是字典，所以，当我遍历它以获得每个句子的情绪值我得到以下错误："string indices must be integers".

我该怎么做才能克服它？任何人都可以帮助我吗？提前致谢。代码如下：

from pycorenlp import StanfordCoreNLP
nlp_wrapper = StanfordCoreNLP('http://localhost:9000')
doc = "I like this chocolate. This chocolate is not good. The chocolate is delicious. Its a very 
    tasty chocolate. This is so bad"
annot_doc = nlp_wrapper.annotate(doc,
                                 properties={
                                            'annotators': 'sentiment',
                                            'outputFormat': 'json',
                                            'timeout': 100000,
                                 })
for sentence in annot_doc["sentences"]:
      print(" ".join([word["word"] for word in sentence["tokens"]]) + " => "\
            + str(sentence["sentimentValue"]) + " = "+ sentence["sentiment"])

Answer 1

你应该只使用官方的 stanfordnlp 包！（注意：名称将在某些时候更改为节）

这里是所有的细节，你可以从服务器得到各种输出格式，包括JSON。

https://stanfordnlp.github.io/stanfordnlp/corenlp_client.html

from stanfordnlp.server import CoreNLPClient
with CoreNLPClient(annotators=['tokenize','ssplit','pos','lemma','ner', 'parse', 'depparse','coref'], timeout=30000, memory='16G') as client:
    # submit the request to the server
    ann = client.annotate(text)

Answer 2

如果您提供错误堆栈跟踪，那就太好了。这样做的原因是注释器很快就会超时并且 returns 断言消息 'the text is too large..'。它的数据类型是 .此外，我会更加关注 Petr Matuska 的评论。通过查看您的示例，很明显您的目标是找到句子的情绪及其情绪分数。在使用 CoreNLPCLient 的结果中找不到情绪分数。我遇到了类似的问题，但我确实解决了这个问题。如果文本很大，您必须将超时值设置得更高（例如，超时 = 500000）。注释器也会生成字典，因此会消耗大量内存。对于更大的文本语料库，这将是一个很大的问题！！因此，如何处理代码中的数据结构取决于我们。有一些替代方法，例如使用 slot、元组或命名元组来加快访问速度。

为什么我在使用 pycorenlp.StanfordCoreNLP.annotate 时得到 String where should get a dict？

Why am I getting String where should get a dict when using pycorenlp.StanfordCoreNLP.annotate?

stanford-nlp

sentiment-analysis

pycorenlp