python 的斯坦福 nlp

Question

我只想找到任何给定字符串的情绪 (positive/negative/neutral)。在研究过程中，我遇到了 Stanford NLP。但遗憾的是它在 Java。关于如何使其适用于 python 的任何想法？

Answer 1

Textblob 是用 Python 编写的一个很棒的情感分析包。你可以拥有 docs here 。任何给定句子的情感分析都是通过检查单词及其相应的情感分数（情感）来进行的。你可以从

开始

$ pip install -U textblob
$ python -m textblob.download_corpora

自从您通过 -U will upgrade the pip package its latest available version 以来，第一个 pip install 命令将为您提供安装在 (virtualenv) 系统中的最新版本的 textblob。接下来将下载所有需要的数据，corpus。

Answer 2

我也遇到过类似的情况。我的大部分项目都在 Python，情绪部分在 Java。幸运的是，了解如何使用 stanford CoreNLP jar 非常容易。

这是我的脚本之一，您可以下载 jars 和运行。

import java.util.List;
import java.util.Properties;
import edu.stanford.nlp.ling.CoreAnnotations;
import edu.stanford.nlp.neural.rnn.RNNCoreAnnotations;
import edu.stanford.nlp.pipeline.Annotation;
import edu.stanford.nlp.pipeline.StanfordCoreNLP;
import edu.stanford.nlp.sentiment.SentimentCoreAnnotations.SentimentAnnotatedTree;
import edu.stanford.nlp.trees.Tree;
import edu.stanford.nlp.util.ArrayCoreMap;
import edu.stanford.nlp.util.CoreMap;

public class Simple_NLP {
static StanfordCoreNLP pipeline;

    public static void init() {
        Properties props = new Properties();
        props.setProperty("annotators", "tokenize, ssplit, parse, sentiment");
        pipeline = new StanfordCoreNLP(props);
    }

    public static String findSentiment(String tweet) {
        String SentiReturn = "";
        String[] SentiClass ={"very negative", "negative", "neutral", "positive", "very positive"};

        //Sentiment is an integer, ranging from 0 to 4. 
        //0 is very negative, 1 negative, 2 neutral, 3 positive and 4 very positive.
        int sentiment = 2;

        if (tweet != null && tweet.length() > 0) {
            Annotation annotation = pipeline.process(tweet);

            List<CoreMap> sentences = annotation.get(CoreAnnotations.SentencesAnnotation.class);
            if (sentences != null && sentences.size() > 0) {

                ArrayCoreMap sentence = (ArrayCoreMap) sentences.get(0);                
                Tree tree = sentence.get(SentimentAnnotatedTree.class);  
                sentiment = RNNCoreAnnotations.getPredictedClass(tree);             
                SentiReturn = SentiClass[sentiment];
            }
        }
        return SentiReturn;
    }

}

Answer 3

我面临着同样的问题：也许 stanford_corenlp_py 的解决方案使用了 Py4j，正如@roopalgarg 所指出的。

stanford_corenlp_py

This repo provides a Python interface for calling the "sentiment" and "entitymentions" annotators of Stanford's CoreNLP Java package, current as of v. 3.5.1. It uses py4j to interact with the JVM; as such, in order to run a script like scripts/runGateway.py, you must first compile and run the Java classes creating the JVM gateway.

Answer 4

使用`py-corenlp`

下载Stanford CoreNLP

此时（2020-05-25）最新版本为4.0.0：

wget https://nlp.stanford.edu/software/stanford-corenlp-4.0.0.zip https://nlp.stanford.edu/software/stanford-corenlp-4.0.0-models-english.jar

如果你没有wget, you probably have curl:

curl https://nlp.stanford.edu/software/stanford-corenlp-4.0.0.zip -O https://nlp.stanford.edu/software/stanford-corenlp-4.0.0-models-english.jar -O

如果一切都失败了，请使用浏览器 ;-)

安装包

unzip stanford-corenlp-4.0.0.zip
mv stanford-corenlp-4.0.0-models-english.jar stanford-corenlp-4.0.0

启动server

cd stanford-corenlp-4.0.0
java -mx5g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -timeout 10000

备注：

timeout是以毫秒为单位的，我上面设置为10秒。如果您将巨大的 blob 传递给服务器，您应该增加它。
有more options，可以用--help列出来。
-mx5g应该分配足够memory，但是YMMV，如果你的盒子功率不足，你可能需要修改选项。

安装 python 包

标准套餐

pip install pycorenlp

不能使用Python 3.9，所以你需要

pip install git+https://github.com/sam-s/py-corenlp.git

（另见 the official list）。

使用它

from pycorenlp import StanfordCoreNLP

nlp = StanfordCoreNLP('http://localhost:9000')
res = nlp.annotate("I love you. I hate him. You are nice. He is dumb",
                   properties={
                       'annotators': 'sentiment',
                       'outputFormat': 'json',
                       'timeout': 1000,
                   })
for s in res["sentences"]:
    print("%d: '%s': %s %s" % (
        s["index"],
        " ".join([t["word"] for t in s["tokens"]]),
        s["sentimentValue"], s["sentiment"]))

你将得到：

0: 'I love you .': 3 Positive
1: 'I hate him .': 1 Negative
2: 'You are nice .': 3 Positive
3: 'He is dumb': 1 Negative

备注

您将整个文本传递给服务器并将其拆分为句子。它还将句子拆分为标记。
每个句子都赋予了情绪，而不是 整个文本 。 mean sentimentValue 跨句可以用来估计整个文本的情感。
一句话的平均情绪在Neutral（2）和Negative（1）之间，范围在VeryNegative（0）到VeryPositive（ 4) 这似乎是相当罕见的。
您可以 stop the server 通过在您启动它的终端输入 Ctrl-C 或使用 shell 命令 kill $(lsof -ti tcp:9000)。 9000 是默认端口，您可以在启动服务器时使用 -port 选项更改它。
如果出现超时错误，请在服务器或客户端中增加 timeout（以毫秒为单位）。
sentiment只是一个个注释器，还有many more个，可以请求多个，用逗号分隔：'annotators': 'sentiment,lemma'。
请注意，情绪模型有些特殊（例如，the result is different depending on whether you mention David or Bill）。

PS。我不敢相信我添加了 9th 答案，但是，我想，我不得不这样做，因为现有答案中的 none 帮助了我（以前的 8 个答案中的一些现在已经已删除，部分已转为评论）。

Answer 5

使用 stanfordcore-nlp python 库

stanford-corenlp 是 stanfordcore-nlp 之上的一个非常好的包装器，可以在 python.

中使用它

wget http://nlp.stanford.edu/software/stanford-corenlp-full-2018-10-05.zip

用法

# Simple usage
from stanfordcorenlp import StanfordCoreNLP

nlp = StanfordCoreNLP('/Users/name/stanford-corenlp-full-2018-10-05')

sentence = 'Guangdong University of Foreign Studies is located in Guangzhou.'
print('Tokenize:', nlp.word_tokenize(sentence))
print('Part of Speech:', nlp.pos_tag(sentence))
print('Named Entities:', nlp.ner(sentence))
print('Constituency Parsing:', nlp.parse(sentence))
print('Dependency Parsing:', nlp.dependency_parse(sentence))

nlp.close() # Do not forget to close! The backend server will consume a lot memory.

More info

Answer 6

我建议使用 TextBlob 库。示例实现如下：

from textblob import TextBlob
def sentiment(message):
    # create TextBlob object of passed tweet text
    analysis = TextBlob(message)
    # set sentiment
    return (analysis.sentiment.polarity)

Answer 7

这个问题有非常新的进展：

现在您可以在 python:

中使用 stanfordnlp 包

来自README：

>>> import stanfordnlp
>>> stanfordnlp.download('en')   # This downloads the English models for the neural pipeline
>>> nlp = stanfordnlp.Pipeline() # This sets up a default neural pipeline in English
>>> doc = nlp("Barack Obama was born in Hawaii.  He was elected president in 2008.")
>>> doc.sentences[0].print_dependencies()

Answer 8

来自斯坦福的 NLP 工具的原生 Python 实现

斯坦福大学最近发布了一个新的 Python packaged 实现基于神经网络 (NN) 的最重要 NLP 任务的算法：

标记化
多词标记 (MWT) 扩展
词形还原
词性 (POS) 和形态特征标记
依赖解析

在Python中实现，使用PyTorch作为神经网络库。该包包含超过 50 languages 的准确模型。

要安装，您可以使用 PIP：

pip install stanfordnlp

要执行基本任务，您可以使用本机 Python 界面 many NLP algorithms:

import stanfordnlp

stanfordnlp.download('en')   # This downloads the English models for the neural pipeline
nlp = stanfordnlp.Pipeline() # This sets up a default neural pipeline in English
doc = nlp("Barack Obama was born in Hawaii.  He was elected president in 2008.")
doc.sentences[0].print_dependencies()

编辑：

到目前为止，库 不支持情绪分析 ，但我没有删除答案，因为它直接回答了问题的 "Stanford nlp for python" 部分。

Answer 9

现在他们有 STANZA。

https://stanfordnlp.github.io/stanza/

发行历史 请注意，在版本 1.0.0 之前，Stanza 库被命名为“StanfordNLP”。要安装 v1.0.0 之前的历史版本，您需要运行 pip install stanfordnlp。

因此，它确认 Stanza 是斯坦福 NLP 的完整 python 版本。

python 的斯坦福 nlp

Stanford nlp for python

python

stanford-nlp

sentiment-analysis

stanford_corenlp_py

使用`py-corenlp`

下载Stanford CoreNLP

安装包

启动server

安装 python 包

使用它

备注

使用 stanfordcore-nlp python 库

用法

来自斯坦福的 NLP 工具的原生 Python 实现

python 的斯坦福 nlp

Stanford nlp for python

python

stanford-nlp

sentiment-analysis

stanford_corenlp_py

使用py-corenlp

下载Stanford CoreNLP

安装包

启动server

安装 python 包

使用它

备注

使用 stanfordcore-nlp python 库

用法

来自斯坦福的 NLP 工具的原生 Python 实现

使用`py-corenlp`