语法 - 如何匹配单词前后的可选和必需的空格?
Grammar - How to match optional and required whitespaces before and after words?
我正在使用 nearley and moo 提出一个相当复杂的语法。除了我的空白要求外,它似乎工作正常。我需要在需要时 要求 空格,在不需要时允许它,同时保持语法明确。
例如:
After dinner, I went to bed.
我需要在单词之间使用空格,但允许在逗号周围留空。所以以下也是有效的:
After dinner , I went to bed.
After dinner,I went to bed.
下面是尝试执行此操作的快速 nearley 语法。
如果您不了解语法,则很容易理解。
// Required whitespace
rws : [ \t]+
// Optional whitespace
ows : [ \t]*
sentence -> words %ows "," sentence
| words
words -> word %rws words
-> word
word -> [a-zA-Z]
语法可能有问题,但思路是一样的。这成为一个有歧义的语法。我如何定义明确的语法,期望有可选和必需的空格?
我对 Nearly 和 Moo 都不熟悉,但正则表达式可能是
whitespace : ([ \t]*,[ \t]*|[ \t])
你的语法会变成
word %whitespace word
希望这是有道理的,我没有完全搞砸语言。
我发现使用 moo-lexer 使我的语法更简单,因此我通常花更少的时间来修复不明确的语法。
我不是设计语法方面的专家,但我会这样做:
lexer.js
word
将匹配一个字符序列
comma
将匹配 " , "
、" ,"
、", "
和 ","
.
space
将匹配单个 space " "
period
将匹配单个句点 "."
nl
将匹配一个或多个换行符。
const moo = require('moo');
const lexer =
moo.compile
( { word: /[a-zA-Z]+/
, comma:/ ?, ?/
, space: / /
, period: /\./
, nl: {match: /\n+/, lineBreaks: true}
}
);
module.exports = lexer;
grammar.ne
这里我们说:
- 一篇文章有一个或多个句子
- 每个句子前后都可以换行
- 一个句子 可以 以一系列
%word
后跟 %comma
或 %space
和 必须以 %word
结尾,后跟 %period.
所有 post- 处理规则都是扁平化标记列表并从标记中提取 .value
以便我们最终得到单词列表。
@{% const lexer = require("./lexer.js"); %}
@lexer lexer
text
-> %nl sentence:+ {% ([_, sentences]) => sentences %}
sentence
-> seq:* %word %period %nl {% ([seq, w, p, n]) => [...seq, w.value] %}
seq
-> (%word %space) {% ([[w]]) => w.value %}
| (%word %comma) {% ([[w]]) => w.value %}
此语法允许解析此文本:
After breakfast, I went to work.
After lunch , I went to my desk.
After the pub,I went home.
sleep.
示例:
const nearley = require('nearley');
const grammar = require('./grammar.js');
const parser = new nearley.Parser(nearley.Grammar.fromCompiled(grammar));
parser.feed(`
After breakfast, I went to work.
After lunch , I went to my desk.
After the pub,I went home.
sleep.
`);
if (parser.results.length > 1) throw new Error('grammar is ambiguous');
JSON.stringify(parser.results[0], null, 2);
输出:
[
[
"After",
"breakfast",
"I",
"went",
"to",
"work"
],
[
"After",
"lunch",
"I",
"went",
"to",
"my",
"desk"
],
[
"After",
"the",
"pub",
"I",
"went",
"home"
],
[
"sleep"
]
]
我正在使用 nearley and moo 提出一个相当复杂的语法。除了我的空白要求外,它似乎工作正常。我需要在需要时 要求 空格,在不需要时允许它,同时保持语法明确。
例如:
After dinner, I went to bed.
我需要在单词之间使用空格,但允许在逗号周围留空。所以以下也是有效的:
After dinner , I went to bed.
After dinner,I went to bed.
下面是尝试执行此操作的快速 nearley 语法。 如果您不了解语法,则很容易理解。
// Required whitespace
rws : [ \t]+
// Optional whitespace
ows : [ \t]*
sentence -> words %ows "," sentence
| words
words -> word %rws words
-> word
word -> [a-zA-Z]
语法可能有问题,但思路是一样的。这成为一个有歧义的语法。我如何定义明确的语法,期望有可选和必需的空格?
我对 Nearly 和 Moo 都不熟悉,但正则表达式可能是
whitespace : ([ \t]*,[ \t]*|[ \t])
你的语法会变成
word %whitespace word
希望这是有道理的,我没有完全搞砸语言。
我发现使用 moo-lexer 使我的语法更简单,因此我通常花更少的时间来修复不明确的语法。
我不是设计语法方面的专家,但我会这样做:
lexer.js
word
将匹配一个字符序列comma
将匹配" , "
、" ,"
、", "
和","
.space
将匹配单个 space" "
period
将匹配单个句点"."
nl
将匹配一个或多个换行符。
const moo = require('moo');
const lexer =
moo.compile
( { word: /[a-zA-Z]+/
, comma:/ ?, ?/
, space: / /
, period: /\./
, nl: {match: /\n+/, lineBreaks: true}
}
);
module.exports = lexer;
grammar.ne
这里我们说:
- 一篇文章有一个或多个句子
- 每个句子前后都可以换行
- 一个句子 可以 以一系列
%word
后跟%comma
或%space
和 必须以%word
结尾,后跟%period.
所有 post- 处理规则都是扁平化标记列表并从标记中提取 .value
以便我们最终得到单词列表。
@{% const lexer = require("./lexer.js"); %}
@lexer lexer
text
-> %nl sentence:+ {% ([_, sentences]) => sentences %}
sentence
-> seq:* %word %period %nl {% ([seq, w, p, n]) => [...seq, w.value] %}
seq
-> (%word %space) {% ([[w]]) => w.value %}
| (%word %comma) {% ([[w]]) => w.value %}
此语法允许解析此文本:
After breakfast, I went to work.
After lunch , I went to my desk.
After the pub,I went home.
sleep.
示例:
const nearley = require('nearley');
const grammar = require('./grammar.js');
const parser = new nearley.Parser(nearley.Grammar.fromCompiled(grammar));
parser.feed(`
After breakfast, I went to work.
After lunch , I went to my desk.
After the pub,I went home.
sleep.
`);
if (parser.results.length > 1) throw new Error('grammar is ambiguous');
JSON.stringify(parser.results[0], null, 2);
输出:
[
[
"After",
"breakfast",
"I",
"went",
"to",
"work"
],
[
"After",
"lunch",
"I",
"went",
"to",
"my",
"desk"
],
[
"After",
"the",
"pub",
"I",
"went",
"home"
],
[
"sleep"
]
]