使用 fcfg 派生 sql 查询时出现 NLTK 问题
NLTK issue in deriving sql query using fcfg
我正在使用 NLTK 使用基于特征的 cfg 从英文文本中获取 sql 查询。我关注了这个 link http://www.nltk.org/book/ch10.html。我可以 运行 说明 fcfg 存储在 sql0.fcfg 文件中的示例。
之后,我尝试对其进行修改以供自己使用,其中我添加了以下一组新规则:
% start S
## Added by me
S[SEM=(?whadvp + ?sq)] -> WHADVP[SEM=?whadvp] SQ[SEM=?sq]
WHADVP[SEM=(?wrb + ?jj)] -> WRB[SEM=?wrb] JJ[SEM=?jj]
SQ[SEM=(?vbp + ?np + ?vp)] -> VBP[SEM=?vbp] NP[SEM=?np] VP[SEM=?vp]
NP[SEM=(?np + ?pp)] -> NP[SEM=?np] PP[SEM=?pp]
NP[SEM=(?np)] -> JJS[SEM=?jjs]
VP[SEM=(?vbz + ?advp)] -> VBZ[SEM=?vbz] ADVP[SEM=?advp]
PP[SEM=(?in + ?np)] -> IN[SEM=?in] NP[SEM=?np]
NP[SEM=(?prp + ?nn)] -> PRP$[SEM=?prp] NN[SEM=?nn]
ADVP[SEM=(?rb)] -> RB[SEM=?rb]
WRB[SEM='SELECT average(calldurationinsexonds) FROM Task'] -> 'How'
JJ[SEM=''] -> 'long'
VBP[SEM=''] -> 'do'
JJS[SEM=''] -> 'most'
IN[SEM=''] -> 'of'
PRP$[SEM=''] -> 'our'
NN[SEM=''] -> 'phone'
VBZ[SEM=''] -> 'calls'
JJ[SEM=''] -> 'last'
## Default example
S[SEM=(?np + WHERE + ?vp)] -> NP[SEM=?np] VP[SEM=?vp]
VP[SEM=(?v + ?pp)] -> IV[SEM=?v] PP[SEM=?pp]
VP[SEM=(?v + ?ap)] -> IV[SEM=?v] AP[SEM=?ap]
NP[SEM=(?det + ?n)] -> Det[SEM=?det] N[SEM=?n]
PP[SEM=(?p + ?np)] -> P[SEM=?p] NP[SEM=?np]
AP[SEM=?pp] -> A[SEM=?a] PP[SEM=?pp]
NP[SEM='Country="greece"'] -> 'Greece'
NP[SEM='Country="china"'] -> 'China'
Det[SEM='SELECT'] -> 'Which' | 'What'
N[SEM='City FROM city_table'] -> 'cities'
IV[SEM=''] -> 'are'
A[SEM=''] -> 'located'
P[SEM=''] -> 'in'
保存文件后,当我执行以下命令时 运行 出错
cp = load_parser('grammars/book_grammars/sql0.fcfg')
query = 'How long do most of our phone calls last'
trees = list(cp.parse(query.split()))
错误:
Traceback (most recent call last): File "", line 1, in
File "C:\Python27\lib\site-packages\nltk\parse\chart.py",
line 1350, in parse
chart = self.chart_parse(tokens) File "C:\Python27\lib\site-packages\nltk\parse\chart.py", line 1309, in
chart_parse
self._grammar.check_coverage(tokens) File "C:\Python27\lib\site-packages\nltk\grammar.py", line 631, in
check_coverage
"input words: %r." % missing) ValueError: Grammar does not cover some of the input words: u"'How', 'long', 'do', 'most', 'of', 'our',
'phone', 'calls', 'last'".
不知道是我添加的语法有误还是其他问题。任何帮助或建议都会很棒。
问题是我正在修改 \grammars\book_grammars\sql0.fcfg。当我将它另存为单独的文件并从那里加载语法时,问题就解决了。
不知道为什么会这样,但它解决了问题。
我正在使用 NLTK 使用基于特征的 cfg 从英文文本中获取 sql 查询。我关注了这个 link http://www.nltk.org/book/ch10.html。我可以 运行 说明 fcfg 存储在 sql0.fcfg 文件中的示例。
之后,我尝试对其进行修改以供自己使用,其中我添加了以下一组新规则:
% start S
## Added by me
S[SEM=(?whadvp + ?sq)] -> WHADVP[SEM=?whadvp] SQ[SEM=?sq]
WHADVP[SEM=(?wrb + ?jj)] -> WRB[SEM=?wrb] JJ[SEM=?jj]
SQ[SEM=(?vbp + ?np + ?vp)] -> VBP[SEM=?vbp] NP[SEM=?np] VP[SEM=?vp]
NP[SEM=(?np + ?pp)] -> NP[SEM=?np] PP[SEM=?pp]
NP[SEM=(?np)] -> JJS[SEM=?jjs]
VP[SEM=(?vbz + ?advp)] -> VBZ[SEM=?vbz] ADVP[SEM=?advp]
PP[SEM=(?in + ?np)] -> IN[SEM=?in] NP[SEM=?np]
NP[SEM=(?prp + ?nn)] -> PRP$[SEM=?prp] NN[SEM=?nn]
ADVP[SEM=(?rb)] -> RB[SEM=?rb]
WRB[SEM='SELECT average(calldurationinsexonds) FROM Task'] -> 'How'
JJ[SEM=''] -> 'long'
VBP[SEM=''] -> 'do'
JJS[SEM=''] -> 'most'
IN[SEM=''] -> 'of'
PRP$[SEM=''] -> 'our'
NN[SEM=''] -> 'phone'
VBZ[SEM=''] -> 'calls'
JJ[SEM=''] -> 'last'
## Default example
S[SEM=(?np + WHERE + ?vp)] -> NP[SEM=?np] VP[SEM=?vp]
VP[SEM=(?v + ?pp)] -> IV[SEM=?v] PP[SEM=?pp]
VP[SEM=(?v + ?ap)] -> IV[SEM=?v] AP[SEM=?ap]
NP[SEM=(?det + ?n)] -> Det[SEM=?det] N[SEM=?n]
PP[SEM=(?p + ?np)] -> P[SEM=?p] NP[SEM=?np]
AP[SEM=?pp] -> A[SEM=?a] PP[SEM=?pp]
NP[SEM='Country="greece"'] -> 'Greece'
NP[SEM='Country="china"'] -> 'China'
Det[SEM='SELECT'] -> 'Which' | 'What'
N[SEM='City FROM city_table'] -> 'cities'
IV[SEM=''] -> 'are'
A[SEM=''] -> 'located'
P[SEM=''] -> 'in'
保存文件后,当我执行以下命令时 运行 出错
cp = load_parser('grammars/book_grammars/sql0.fcfg')
query = 'How long do most of our phone calls last'
trees = list(cp.parse(query.split()))
错误:
Traceback (most recent call last): File "", line 1, in File "C:\Python27\lib\site-packages\nltk\parse\chart.py", line 1350, in parse chart = self.chart_parse(tokens) File "C:\Python27\lib\site-packages\nltk\parse\chart.py", line 1309, in chart_parse self._grammar.check_coverage(tokens) File "C:\Python27\lib\site-packages\nltk\grammar.py", line 631, in check_coverage "input words: %r." % missing) ValueError: Grammar does not cover some of the input words: u"'How', 'long', 'do', 'most', 'of', 'our', 'phone', 'calls', 'last'".
不知道是我添加的语法有误还是其他问题。任何帮助或建议都会很棒。
问题是我正在修改 \grammars\book_grammars\sql0.fcfg。当我将它另存为单独的文件并从那里加载语法时,问题就解决了。
不知道为什么会这样,但它解决了问题。