无法在 python 模块中使用 Stanford NER
Unable to use Stanford NER in python module
我想使用 Python Stanford NER 模块,但一直出错,我在网上搜索过,但一无所获。这是错误的基本用法。
import ner
tagger = ner.HttpNER(host='localhost', port=8080)
tagger.get_entities("University of California is located in California,
United States")
错误
Traceback (most recent call last):
File "<pyshell#3>", line 1, in <module>
tagger.get_entities("University of California is located in California, United States")
File "C:\Python27\lib\site-packages\ner\client.py", line 81, in get_entities
tagged_text = self.tag_text(text)
File "C:\Python27\lib\site-packages\ner\client.py", line 165, in tag_text
c.request('POST', self.location, params, headers)
File "C:\Python27\lib\httplib.py", line 1057, in request
self._send_request(method, url, body, headers)
File "C:\Python27\lib\httplib.py", line 1097, in _send_request
self.endheaders(body)
File "C:\Python27\lib\httplib.py", line 1053, in endheaders
self._send_output(message_body)
File "C:\Python27\lib\httplib.py", line 897, in _send_output
self.send(msg)
File "C:\Python27\lib\httplib.py", line 859, in send
self.connect()
File "C:\Python27\lib\httplib.py", line 836, in connect
self.timeout, self.source_address)
File "C:\Python27\lib\socket.py", line 575, in create_connection
raise err
error: [Errno 10061] No connection could be made because the target machine actively refused it
使用 windows 10 安装了最新的 Java
- Python Stanford NER 模块是 Stanford NER 的包装器
允许您通过 运行 python 命令来使用 NER 服务。
- 内尔
服务是 Python 模块的独立实体。这是一个 Java
程序。要通过 python 或任何其他方式访问此服务,您
首先需要启动服务。
- 有关如何启动 Java 的详细信息
Program/service 可以在这里找到 -
http://nlp.stanford.edu/software/CRF-NER.shtml
NER 自带
windows 的 .bat
文件和 unix/linux 的 .sh
文件。我认为
这些文件开始 GUI
要在没有 GUI
的情况下启动服务,您应该 运行 类似这样的命令:
java -mx600m -cp stanford-ner.jar edu.stanford.nlp.ie.crf.CRFClassifier -loadClassifier classifiers/english.all.3class.distsim.crf.ser.gz
这 运行 是 NER jar,设置内存,并设置您要使用的分类器。 (我认为你必须在 Stanford NER 目录中才能 运行 这个)
NER 程序 运行ning 后,您将能够 运行 您的 python 代码并查询 NER。
- 这是 python 3x
中完整的斯坦福 NER 脚本
此代码将从 "TextFilestoTest" 文件夹中读取每个文本文件并检测实体并存储在数据框中(测试)
import os
import nltk
import pandas as pd
import collections
from nltk.tag import StanfordNERTagger
from nltk.tokenize import word_tokenize
stanford_classifier = 'ner-trained-EvensTrain.ser.gz'
stanford_ner_path = 'stanford-ner.jar'
# Creating Tagger Object
st = StanfordNERTagger(stanford_classifier, stanford_ner_path, encoding='utf-8')
java_path = "C:/Program Files (x86)/Java/jre1.8.0_191/bin/java.exe"
os.environ['JAVAHOME'] = java_path
def get_continuous_chunks(tagged_sent):
continuous_chunk = []
current_chunk = []
for token, tag in tagged_sent:
if tag != "0":
current_chunk.append((token, tag))
else:
if current_chunk: # if the current chunk is not empty
continuous_chunk.append(current_chunk)
current_chunk = []
# Flush the final current_chunk into the continuous_chunk, if any.
if current_chunk:
continuous_chunk.append(current_chunk)
return continuous_chunk
TestFiles = './TextFilestoTest/'
files_path = os.listdir(TestFiles)
Test = {}
for i in files_path:
p = (TestFiles+i)
g= (os.path.splitext(i)[0])
Test[str(g)] = open(p, 'r').read()
## Predict labels of all words of 200 text files and inserted into dataframe
df_fin = pd.DataFrame(columns = ["filename","Word","Label"])
for i in Test:
test_text = Test[i]
test_text = test_text.replace("\n"," ")
tokenized_text = test_text.split(" ")
classified_text = st.tag(tokenized_text)
ne_tagged_sent = classified_text
named_entities = get_continuous_chunks(ne_tagged_sent)
flat_list = [item for sublist in named_entities for item in sublist]
for fl in flat_list:
df_ = pd.DataFrame()
df_["filename"] = [i]
df_["Word"] = [fl[0]]
df_["Label"] = [fl[1]]
df_fin = df_fin.append(df_)
df_fin_vone = pd.DataFrame(columns = ["filename","Word","Label"])
test_files_len = list(set(df_fin['filename']))
有什么问题可以在下方评论,我会一一解答。谢谢
我想使用 Python Stanford NER 模块,但一直出错,我在网上搜索过,但一无所获。这是错误的基本用法。
import ner
tagger = ner.HttpNER(host='localhost', port=8080)
tagger.get_entities("University of California is located in California,
United States")
错误
Traceback (most recent call last):
File "<pyshell#3>", line 1, in <module>
tagger.get_entities("University of California is located in California, United States")
File "C:\Python27\lib\site-packages\ner\client.py", line 81, in get_entities
tagged_text = self.tag_text(text)
File "C:\Python27\lib\site-packages\ner\client.py", line 165, in tag_text
c.request('POST', self.location, params, headers)
File "C:\Python27\lib\httplib.py", line 1057, in request
self._send_request(method, url, body, headers)
File "C:\Python27\lib\httplib.py", line 1097, in _send_request
self.endheaders(body)
File "C:\Python27\lib\httplib.py", line 1053, in endheaders
self._send_output(message_body)
File "C:\Python27\lib\httplib.py", line 897, in _send_output
self.send(msg)
File "C:\Python27\lib\httplib.py", line 859, in send
self.connect()
File "C:\Python27\lib\httplib.py", line 836, in connect
self.timeout, self.source_address)
File "C:\Python27\lib\socket.py", line 575, in create_connection
raise err
error: [Errno 10061] No connection could be made because the target machine actively refused it
使用 windows 10 安装了最新的 Java
- Python Stanford NER 模块是 Stanford NER 的包装器 允许您通过 运行 python 命令来使用 NER 服务。
- 内尔 服务是 Python 模块的独立实体。这是一个 Java 程序。要通过 python 或任何其他方式访问此服务,您 首先需要启动服务。
- 有关如何启动 Java 的详细信息 Program/service 可以在这里找到 - http://nlp.stanford.edu/software/CRF-NER.shtml
NER 自带 windows 的
.bat
文件和 unix/linux 的.sh
文件。我认为 这些文件开始GUI
要在没有
GUI
的情况下启动服务,您应该 运行 类似这样的命令:
java -mx600m -cp stanford-ner.jar edu.stanford.nlp.ie.crf.CRFClassifier -loadClassifier classifiers/english.all.3class.distsim.crf.ser.gz
这 运行 是 NER jar,设置内存,并设置您要使用的分类器。 (我认为你必须在 Stanford NER 目录中才能 运行 这个)NER 程序 运行ning 后,您将能够 运行 您的 python 代码并查询 NER。
- 这是 python 3x 中完整的斯坦福 NER 脚本
此代码将从 "TextFilestoTest" 文件夹中读取每个文本文件并检测实体并存储在数据框中(测试)
import os
import nltk
import pandas as pd
import collections
from nltk.tag import StanfordNERTagger
from nltk.tokenize import word_tokenize
stanford_classifier = 'ner-trained-EvensTrain.ser.gz'
stanford_ner_path = 'stanford-ner.jar'
# Creating Tagger Object
st = StanfordNERTagger(stanford_classifier, stanford_ner_path, encoding='utf-8')
java_path = "C:/Program Files (x86)/Java/jre1.8.0_191/bin/java.exe"
os.environ['JAVAHOME'] = java_path
def get_continuous_chunks(tagged_sent):
continuous_chunk = []
current_chunk = []
for token, tag in tagged_sent:
if tag != "0":
current_chunk.append((token, tag))
else:
if current_chunk: # if the current chunk is not empty
continuous_chunk.append(current_chunk)
current_chunk = []
# Flush the final current_chunk into the continuous_chunk, if any.
if current_chunk:
continuous_chunk.append(current_chunk)
return continuous_chunk
TestFiles = './TextFilestoTest/'
files_path = os.listdir(TestFiles)
Test = {}
for i in files_path:
p = (TestFiles+i)
g= (os.path.splitext(i)[0])
Test[str(g)] = open(p, 'r').read()
## Predict labels of all words of 200 text files and inserted into dataframe
df_fin = pd.DataFrame(columns = ["filename","Word","Label"])
for i in Test:
test_text = Test[i]
test_text = test_text.replace("\n"," ")
tokenized_text = test_text.split(" ")
classified_text = st.tag(tokenized_text)
ne_tagged_sent = classified_text
named_entities = get_continuous_chunks(ne_tagged_sent)
flat_list = [item for sublist in named_entities for item in sublist]
for fl in flat_list:
df_ = pd.DataFrame()
df_["filename"] = [i]
df_["Word"] = [fl[0]]
df_["Label"] = [fl[1]]
df_fin = df_fin.append(df_)
df_fin_vone = pd.DataFrame(columns = ["filename","Word","Label"])
test_files_len = list(set(df_fin['filename']))
有什么问题可以在下方评论,我会一一解答。谢谢