Python CYK 算法的实现
Python implementation for the CYK Algorithm
编辑:错误是这一行 if len(rhs) == 2 and rhs[0] in T[i][k] and rhs[1] in T[k + 1][j]:
我能够使用一小组规则、终端和非终端实现基于 cky 解析器 wiki 的 cky 算法。但我把它扩展到有更多的规则、单词、语法,现在它给了我
IndexError: list index out of range
有谁知道我在使用更大的语法集时做错了什么?
如果有帮助的话,这是以前的较小规模的语法。
non_terminals = ["NP", "Nom", "Det", "AP",
"Adv", "A"]
terminals = ["book", "orange", "man",
"tall", "heavy",
"very", "muscular"]
# Rules of the grammar
R = {
"NP": [["Det", "Nom"]],
"Nom": [["AP", "Nom"], ["book"],
["orange"], ["man"]],
"AP": [["Adv", "A"], ["heavy"],
["orange"], ["tall"]],
"Det": [["a"]],
"Adv": [["very"], ["extremely"]],
"A": [["heavy"], ["orange"], ["tall"],
["muscular"]]
}
这是我的函数
def cykParse(w):
n = len(w)
# Initialize the table
T = [[set([]) for j in range(n)] for i in range(n)]
# Filling in the table
for j in range(0, n):
# Iterate over the rules
for lhs, rule in R.items():
for rhs in rule:
# If a terminal is found
if len(rhs) == 1 and rhs[0] == w[j]:
T[j][j].add(lhs)
for i in range(j, -1, -1):
# Iterate over the range i to j + 1
for k in range(i, j + 1):
# Iterate over the rules
for lhs, rule in R.items():
for rhs in rule:
# If a terminal is found
if len(rhs) == 2 and rhs[0] in T[i][k] and rhs[1] in T[k + 1][j]:
T[i][j].add(lhs)
# If word can be formed by rules
# of given grammar
if len(T[0][n-1]) != 0:
print("True")
else:
print("False")
我猜(因为你没有显示指示错误发生位置的实际错误)它在这一行中:
if len(rhs) == 2 and rhs[0] in T[i][k] and rhs[1] in T[k + 1][j]:
并且 k
是 n-1
。如果前两个条件成立,那么第三个将执行并爆炸。
我怀疑 k
的迭代限制存在差一错误。一些代码注释会很有用,或者至少是对您实现所基于的伪代码的引用。
编辑:错误是这一行 if len(rhs) == 2 and rhs[0] in T[i][k] and rhs[1] in T[k + 1][j]:
我能够使用一小组规则、终端和非终端实现基于 cky 解析器 wiki 的 cky 算法。但我把它扩展到有更多的规则、单词、语法,现在它给了我
IndexError: list index out of range
有谁知道我在使用更大的语法集时做错了什么?
如果有帮助的话,这是以前的较小规模的语法。
non_terminals = ["NP", "Nom", "Det", "AP",
"Adv", "A"]
terminals = ["book", "orange", "man",
"tall", "heavy",
"very", "muscular"]
# Rules of the grammar
R = {
"NP": [["Det", "Nom"]],
"Nom": [["AP", "Nom"], ["book"],
["orange"], ["man"]],
"AP": [["Adv", "A"], ["heavy"],
["orange"], ["tall"]],
"Det": [["a"]],
"Adv": [["very"], ["extremely"]],
"A": [["heavy"], ["orange"], ["tall"],
["muscular"]]
}
这是我的函数
def cykParse(w): n = len(w)
# Initialize the table
T = [[set([]) for j in range(n)] for i in range(n)]
# Filling in the table
for j in range(0, n):
# Iterate over the rules
for lhs, rule in R.items():
for rhs in rule:
# If a terminal is found
if len(rhs) == 1 and rhs[0] == w[j]:
T[j][j].add(lhs)
for i in range(j, -1, -1):
# Iterate over the range i to j + 1
for k in range(i, j + 1):
# Iterate over the rules
for lhs, rule in R.items():
for rhs in rule:
# If a terminal is found
if len(rhs) == 2 and rhs[0] in T[i][k] and rhs[1] in T[k + 1][j]:
T[i][j].add(lhs)
# If word can be formed by rules
# of given grammar
if len(T[0][n-1]) != 0:
print("True")
else:
print("False")
我猜(因为你没有显示指示错误发生位置的实际错误)它在这一行中:
if len(rhs) == 2 and rhs[0] in T[i][k] and rhs[1] in T[k + 1][j]:
并且 k
是 n-1
。如果前两个条件成立,那么第三个将执行并爆炸。
我怀疑 k
的迭代限制存在差一错误。一些代码注释会很有用,或者至少是对您实现所基于的伪代码的引用。