为什么在逐行导入文本文件进行情感分析而不是使用硬编码的句子时会出现 TypeError？

Question

我正在尝试逐行分析文本文件中每个给定句子的情绪。每当我使用链接的第一个问题中的硬编码句子时，代码都在工作。当我使用文本文件输入时，我得到 TypeError.

这与提出的问题有关 . And the line by line from text file code is coming from this 问题：

第一个有效，第二个文本文件 ("I love you. I hate him. You are nice. He is dumb") 无效。这是代码：

from pycorenlp import StanfordCoreNLP
nlp = StanfordCoreNLP('http://localhost:9000')
results = []    
with open("c:/nlp/test.txt","r") as f:
    for line in f.read().split('\n'):
        print("Line:" + line)
        res = nlp.annotate(line,
                   properties={
                       'annotators': 'sentiment',
                       'outputFormat': 'json',
                       'timeout': 1000,
                   })
        results.append(res)      

for res in results:             
    s = res["sentences"]         
    print("%d: '%s': %s %s" % (
        s["index"], 
        " ".join([t["word"] for t in s["tokens"]]),
        s["sentimentValue"], s["sentiment"]))

我收到这个错误：

line 21, in

s["index"],

TypeError: list indices must be integers or slices, not str

Answer 1

我没有安装Stanfort-lib，所以无法用它的系统进行测试。但是，它返回让我知道你的 results-variable 是 "List of Dicts" 类型或某种嵌套类型

反正我做了个测试

results = []    

with open("tester.txt","r") as f:
    for line in f.read().split('\n'):
        print("Line:" + line)
        sentences = [
        {
            "index":1,
            "word":line,
            "sentimentValue": "sentVal",
            "sentiment":"senti"
        }
    ]
    results.append(sentences)

然后我构建你的循环并稍微调整它以满足我的需要，例如：

for res in results:         
    for s in res:         
        print("%d: '%s': %s %s" % (
            s["index"], 
            " ".join(s["word"]),
            s["sentimentValue"], s["sentiment"]))

是什么让我打印了以下内容

1: 'I   l o v e   y o u .': sentVal senti
1: 'I   h a t e   h i m .': sentVal senti
1: 'Y o u   a r e   n i c e .': sentVal senti
1: 'H e   i s   d u m b': sentVal senti

所以基本上代码是有效的。但是你必须弄清楚返回值是什么类型，例如从 Stanfort API -> "type(results)" 返回后

当您获得此信息后，您可以从遍历这些值的循环开始，如果您不知道嵌套值是什么类型，您可以调用另一个打印类型。一直向下，直到到达包含您要处理的项目的图层

最后要指出的一件事。在您链接的描述中，在注释中。他在那里介绍了如何将文本传递到 API。他在那里解释说 API 摆脱了切片和格式设置，您只能发送整个文本。如果您没有得到任何结果，请牢记这一点

Answer 2

看来我解决了问题。正如 londo 所指出的：这一行将 S 设置为 List，但它应该是 dict，就像在原始代码中一样：

s = res["sentences"]

我将代码移动到同一个循环中，逐行读取和分析文件，然后直接在那里打印结果。所以新代码看起来像这样：

from pycorenlp import StanfordCoreNLP

nlp = StanfordCoreNLP('http://localhost:9000')

with open("c:/nlp/test.txt","r") as f:
    for line in f.read().split('\n'):
        res = nlp.annotate(line,
                    properties={
                        'annotators': 'sentiment',
                        'outputFormat': 'json',
                        'timeout': 15000,
                   }) 
        for s in res["sentences"]:
            print("%d: '%s': %s %s" % (
            s["index"], 
            " ".join([t["word"] for t in s["tokens"]]),
            s["sentimentValue"], s["sentiment"]))

结果看起来与预期的一样，没有任何错误消息：

0: 'I love you .': 3 Positive
0: 'I hate him .': 1 Negative
0: 'You are nice .': 3 Positive
0: 'He is dumb .': 1 Negative

为什么在逐行导入文本文件进行情感分析而不是使用硬编码的句子时会出现 TypeError？

Why do I get a TypeError when importing a textfile line by line for sentiment analysis instead of using a sentence hard-coded?

python

stanford-nlp

sentiment-analysis

pycorenlp