如何使用 Stanford 依赖解析从文本文件中解析多个句子?
How to parse more than one sentence from text file using Stanford dependency parse?
我有一个有很多行的文本文件,我想解析所有的句子,但似乎我得到了所有的句子,但只解析了第一句,不知道我哪里错了。
import nltk
from nltk.parse.stanford import StanfordDependencyParser
dependency_parser = StanfordDependencyParser( model_path="edu\stanford\lp\models\lexparser\englishPCFG.ser.gz")
txtfile =open('sample.txt',encoding="latin-1")
s=txtfile.read()
print(s)
result = dependency_parser.raw_parse(s)
for i in result:
print(list(i.triples()))
但它只给出了第一句解析三重句而不是其他句子,有什么帮助吗?
'i like this computer'
'The great Buddha, the .....'
'My Ashford experience .... great experience.'
[[(('i', 'VBZ'), 'nsubj', ("'", 'POS')), (('i', 'VBZ'), 'nmod', ('computer', 'NN')), (('computer', 'NN'), 'case', ('like', 'IN')), (('computer', 'NN'), 'det', ('this', 'DT')), (('computer', 'NN'), 'case', ("'", 'POS'))]]
您必须先拆分文本。您目前正在解析您发布的带有引号和所有内容的文字。这部分解析结果很明显:("'", 'POS')
要做到这一点,您似乎可以在每一行上使用 ast.literal_eval
。请注意,撇号(在 "don't" 之类的词中)会破坏格式,您必须自己使用 line = line[1:-1]
:
之类的方式处理撇号
import ast
from nltk.parse.stanford import StanfordDependencyParser
dependency_parser = StanfordDependencyParser( model_path="edu\stanford\lp\models\lexparser\englishPCFG.ser.gz")
with open('sample.txt',encoding="latin-1") as f:
lines = [ast.litral_eval(line) for line in f.readlines()]
for line in lines:
parsed_lines = dependency_parser.raw_parse(line)
# now parsed_lines should contain the parsed lines from the file
尝试:
from nltk.parse.stanford import StanfordDependencyParser
dependency_parser = StanfordDependencyParser(model_path="edu\stanford\lp\models\lexparser\englishPCFG.ser.gz")
with open('sample.txt') as fin:
sents = fin.readlines()
result = dep_parser.raw_parse_sents(sents)
for parse in results:
print list(parse.triples())
请查看 docstring code or demo code in repository 中的示例,它们通常很有帮助。
我有一个有很多行的文本文件,我想解析所有的句子,但似乎我得到了所有的句子,但只解析了第一句,不知道我哪里错了。
import nltk
from nltk.parse.stanford import StanfordDependencyParser
dependency_parser = StanfordDependencyParser( model_path="edu\stanford\lp\models\lexparser\englishPCFG.ser.gz")
txtfile =open('sample.txt',encoding="latin-1")
s=txtfile.read()
print(s)
result = dependency_parser.raw_parse(s)
for i in result:
print(list(i.triples()))
但它只给出了第一句解析三重句而不是其他句子,有什么帮助吗?
'i like this computer'
'The great Buddha, the .....'
'My Ashford experience .... great experience.'
[[(('i', 'VBZ'), 'nsubj', ("'", 'POS')), (('i', 'VBZ'), 'nmod', ('computer', 'NN')), (('computer', 'NN'), 'case', ('like', 'IN')), (('computer', 'NN'), 'det', ('this', 'DT')), (('computer', 'NN'), 'case', ("'", 'POS'))]]
您必须先拆分文本。您目前正在解析您发布的带有引号和所有内容的文字。这部分解析结果很明显:("'", 'POS')
要做到这一点,您似乎可以在每一行上使用 ast.literal_eval
。请注意,撇号(在 "don't" 之类的词中)会破坏格式,您必须自己使用 line = line[1:-1]
:
import ast
from nltk.parse.stanford import StanfordDependencyParser
dependency_parser = StanfordDependencyParser( model_path="edu\stanford\lp\models\lexparser\englishPCFG.ser.gz")
with open('sample.txt',encoding="latin-1") as f:
lines = [ast.litral_eval(line) for line in f.readlines()]
for line in lines:
parsed_lines = dependency_parser.raw_parse(line)
# now parsed_lines should contain the parsed lines from the file
尝试:
from nltk.parse.stanford import StanfordDependencyParser
dependency_parser = StanfordDependencyParser(model_path="edu\stanford\lp\models\lexparser\englishPCFG.ser.gz")
with open('sample.txt') as fin:
sents = fin.readlines()
result = dep_parser.raw_parse_sents(sents)
for parse in results:
print list(parse.triples())
请查看 docstring code or demo code in repository 中的示例,它们通常很有帮助。