Python CYK 算法的实现

Python implementation for the CYK Algorithm

编辑:错误是这一行 if len(rhs) == 2 and rhs[0] in T[i][k] and rhs[1] in T[k + 1][j]:

我能够使用一小组规则、终端和非终端实现基于 cky 解析器 wiki 的 cky 算法。但我把它扩展到有更多的规则、单词、语法,现在它给了我 IndexError: list index out of range 有谁知道我在使用更大的语法集时做错了什么?

如果有帮助的话,这是以前的较小规模的语法。

non_terminals = ["NP", "Nom", "Det", "AP",  
                  "Adv", "A"] 
terminals = ["book", "orange", "man",  
             "tall", "heavy",  
             "very", "muscular"] 
  
# Rules of the grammar 
R = { 
     "NP": [["Det", "Nom"]], 
     "Nom": [["AP", "Nom"], ["book"],  
             ["orange"], ["man"]], 
     "AP": [["Adv", "A"], ["heavy"],  
            ["orange"], ["tall"]], 
     "Det": [["a"]], 
     "Adv": [["very"], ["extremely"]], 
     "A": [["heavy"], ["orange"], ["tall"],  
           ["muscular"]] 
    } 

这是我的函数

def cykParse(w): n = len(w)

# Initialize the table 
T = [[set([]) for j in range(n)] for i in range(n)] 

# Filling in the table 
for j in range(0, n): 

    # Iterate over the rules 
    for lhs, rule in R.items(): 
        for rhs in rule: 
              
            # If a terminal is found 
            if len(rhs) == 1 and rhs[0] == w[j]: 
                T[j][j].add(lhs) 

    for i in range(j, -1, -1):    
           
        # Iterate over the range i to j + 1    
        for k in range(i, j + 1):      

            # Iterate over the rules 
            for lhs, rule in R.items(): 
                for rhs in rule: 
                      
                    # If a terminal is found 
                    if len(rhs) == 2 and rhs[0] in T[i][k] and rhs[1] in T[k + 1][j]: 
                        T[i][j].add(lhs) 

# If word can be formed by rules  
# of given grammar 
if len(T[0][n-1]) != 0: 
    print("True") 
else: 
    print("False") 

我猜(因为你没有显示指示错误发生位置的实际错误)它在这一行中:

if len(rhs) == 2 and rhs[0] in T[i][k] and rhs[1] in T[k + 1][j]:

并且 kn-1。如果前两个条件成立,那么第三个将执行并爆炸。

我怀疑 k 的迭代限制存在差一错误。一些代码注释会很有用,或者至少是对您实现所基于的伪代码的引用。