NLTK 中的 PCFG 生成
PCFG generation in NLTK
我正在尝试从包含解析树的文件中学习 PCFG,例如:
(S (DECL_MD (NP_PPSS (PRON_PPSS (i i))) (VERB_MD (pt_verb_md need))
(NP_NN (ADJ_AT (a a)) (NOUN_NN (flight flight)) (PREP_IN (pt_prep_in
from))) (AVPNP_NP (NOUN_NP (charlotte charlotte))
这是我的相关代码:
def loadData(path):
with open(path ,'r') as f:
data = f.read().split('\n')
return data
def getTreeData(data):
return map(lambda s: tree.Tree.fromstring(s), data)
# Main script
print("loading data..")
data = loadData('C:\Users\Rayyan\Desktop\MSc Data\NLP\parseTrees.txt')
print("generating trees..")
treeData = getTreeData(data)
print("done!")
print("done!")
之后,我在互联网上尝试了很多东西,例如:
grammar = induce_pcfg(S, productions)
但是这里的产生式总是内置函数,例如:
productions = []
for item in treebank.items[:2]:
for tree in treebank.parsed_sents(item):
productions += tree.productions()
我试过用 treeData
替换这里的 production
,但它不起作用。我错过了什么或做错了什么?
从造树开始:
from nltk import tree
treeData_rules = []
# Extract the CFG rules (productions) for the sentence
for item in treeData:
for production in item.productions():
treeData_rules.append(production)
treeData_rules
然后你可以像这样提取Probabilistic-CFG(PCFG):
from nltk import induce_pcfg
S = Nonterminal('S')
grammar_PCFG = induce_pcfg(S, treeData_rules)
print(grammar_PCFG)
我正在尝试从包含解析树的文件中学习 PCFG,例如:
(S (DECL_MD (NP_PPSS (PRON_PPSS (i i))) (VERB_MD (pt_verb_md need)) (NP_NN (ADJ_AT (a a)) (NOUN_NN (flight flight)) (PREP_IN (pt_prep_in from))) (AVPNP_NP (NOUN_NP (charlotte charlotte))
这是我的相关代码:
def loadData(path):
with open(path ,'r') as f:
data = f.read().split('\n')
return data
def getTreeData(data):
return map(lambda s: tree.Tree.fromstring(s), data)
# Main script
print("loading data..")
data = loadData('C:\Users\Rayyan\Desktop\MSc Data\NLP\parseTrees.txt')
print("generating trees..")
treeData = getTreeData(data)
print("done!")
print("done!")
之后,我在互联网上尝试了很多东西,例如:
grammar = induce_pcfg(S, productions)
但是这里的产生式总是内置函数,例如:
productions = []
for item in treebank.items[:2]:
for tree in treebank.parsed_sents(item):
productions += tree.productions()
我试过用 treeData
替换这里的 production
,但它不起作用。我错过了什么或做错了什么?
从造树开始:
from nltk import tree
treeData_rules = []
# Extract the CFG rules (productions) for the sentence
for item in treeData:
for production in item.productions():
treeData_rules.append(production)
treeData_rules
然后你可以像这样提取Probabilistic-CFG(PCFG):
from nltk import induce_pcfg
S = Nonterminal('S')
grammar_PCFG = induce_pcfg(S, treeData_rules)
print(grammar_PCFG)