使用 python 和语法列表解析文本文件

Text file parsing with python and with a list in grammar

我必须进行解析:目标是创建将在语料库中应用的语法规则。我有一个问题:语法中是否可以有一个列表?

示例:

1) Open the text to be analyzed
2) Write the grammatical rules (just an example):
   grammar("""
   S -> NP VP
   NP -> DET N
   VP -> V N
   DET -> list_det.txt
   N -> list_n.txt
   V -> list.txt""")
3) Print the result with the entries that obey this grammar

可能吗?

这是您的语法的快速概念原型,使用 pyparsing。我无法从你的问题中判断出 NVDET 列表的内容可能是什么,所以我只是任意选择由 'n' 和 'v' 组成的词,以及文字 'det'。您可以将 <<= 赋值替换为您的语法的正确表达式,但此解析器和示例字符串应该表明您的语法至少是可行的。 (如果您编辑问题以显示 NVDETof 的列表,我可以用更少的任意表达式更新此答案和示例。还包括要解析的示例字符串会很有用。)

我还添加了一些分组,以便您可以看到语法结构如何反映在结果的结构中。您可以保留或删除它,解析器仍然可以工作。

import pyparsing as pp

v = pp.Forward()
n = pp.Forward()
det = pp.Forward()

V = pp.Group(pp.OneOrMore(v))
N = pp.Group(pp.OneOrMore(n))
DET = pp.Group(pp.OneOrMore(det))

VP = pp.Group(V + N)
NP = pp.Group(DET + N)
S = NP + VP

# replace these with something meaningful
v <<= pp.Word('v')
n <<= pp.Word('n')
det <<= pp.Literal('det')

sample = 'det det nn nn nn nn vv vv vv nn nn nn nn'

parsed = S.parseString(sample)
print(parsed.asList())

打印:

[[['det', 'det'], ['nn', 'nn', 'nn', 'nn']], 
 [['vv', 'vv', 'vv'], ['nn', 'nn', 'nn', 'nn']]]

编辑:

我猜"NP"和"VP"是"noun phrase"和"verb phrase",但我不知道"DET"是什么。尽管如此,我还是编了一个不太抽象的例子。我还扩展了列表以接受更多语法形式的名词和动词列表,并连接“and”和逗号。

import pyparsing as pp

v = pp.Forward()
n = pp.Forward()
det = pp.Forward()

def collectionOf(expr):
    '''
    Compose a collection expression for a base expression that matches
        expr
        expr and expr
        expr, expr, expr, and expr
    '''
    AND = pp.Literal('and')
    OR = pp.Literal('or')
    COMMA = pp.Suppress(',')
    return expr + pp.Optional(
            pp.Optional(pp.OneOrMore(COMMA + expr) + COMMA) + (AND | OR) + expr)

V = pp.Group(collectionOf(v))('V')
N = pp.Group(collectionOf(n))('N')
DET = pp.Group(pp.OneOrMore(det))('DET')

VP = pp.Group(V + N)('VP')
NP = pp.Group(DET + N)('NP')
S = pp.Group(NP + VP)('S')

# replace these with something meaningful
v <<= pp.Combine(pp.oneOf('chase love hate like eat drink') + pp.Optional(pp.Literal('s')))
n <<= pp.Optional(pp.oneOf('the a my your our his her their')) + pp.oneOf("dog cat horse rabbit squirrel food water")
det <<= pp.Optional(pp.oneOf('why how when where')) +pp.oneOf( 'do does did')

samples = '''
    when does the dog eat the food
    does the dog like the cat
    do the horse, cat, and dog like or hate their food
    do the horse and dog love the cat
    why did the dog chase the squirrel
'''
S.runTests(samples)

打印:

when does the dog eat the food
[[[['when', 'does'], ['the', 'dog']], [['eat'], ['the', 'food']]]]
- S: [[['when', 'does'], ['the', 'dog']], [['eat'], ['the', 'food']]]
  - NP: [['when', 'does'], ['the', 'dog']]
    - DET: ['when', 'does']
    - N: ['the', 'dog']
  - VP: [['eat'], ['the', 'food']]
    - N: ['the', 'food']
    - V: ['eat']


does the dog like the cat
[[[['does'], ['the', 'dog']], [['like'], ['the', 'cat']]]]
- S: [[['does'], ['the', 'dog']], [['like'], ['the', 'cat']]]
  - NP: [['does'], ['the', 'dog']]
    - DET: ['does']
    - N: ['the', 'dog']
  - VP: [['like'], ['the', 'cat']]
    - N: ['the', 'cat']
    - V: ['like']


do the horse, cat, and dog like or hate their food
[[[['do'], ['the', 'horse', 'cat', 'and', 'dog']], [['like', 'or', 'hate'], ['their', 'food']]]]
- S: [[['do'], ['the', 'horse', 'cat', 'and', 'dog']], [['like', 'or', 'hate'], ['their', 'food']]]
  - NP: [['do'], ['the', 'horse', 'cat', 'and', 'dog']]
    - DET: ['do']
    - N: ['the', 'horse', 'cat', 'and', 'dog']
  - VP: [['like', 'or', 'hate'], ['their', 'food']]
    - N: ['their', 'food']
    - V: ['like', 'or', 'hate']


do the horse and dog love the cat
[[[['do'], ['the', 'horse', 'and', 'dog']], [['love'], ['the', 'cat']]]]
- S: [[['do'], ['the', 'horse', 'and', 'dog']], [['love'], ['the', 'cat']]]
  - NP: [['do'], ['the', 'horse', 'and', 'dog']]
    - DET: ['do']
    - N: ['the', 'horse', 'and', 'dog']
  - VP: [['love'], ['the', 'cat']]
    - N: ['the', 'cat']
    - V: ['love']


why did the dog chase the squirrel
[[[['why', 'did'], ['the', 'dog']], [['chase'], ['the', 'squirrel']]]]
- S: [[['why', 'did'], ['the', 'dog']], [['chase'], ['the', 'squirrel']]]
  - NP: [['why', 'did'], ['the', 'dog']]
    - DET: ['why', 'did']
    - N: ['the', 'dog']
  - VP: [['chase'], ['the', 'squirrel']]
    - N: ['the', 'squirrel']
    - V: ['chase']