使用 Superpower 解析列表列表
Parsing list of lists with Superpower
我想解析以如下格式表示的图书馆图书:
#Book title 1
Chapter 1
Chapter 2
#Book title 2
Chapter 1
Chapter 2
Chapter 3
如您所见,引导的标题前面有一个#,每本书的章节都是以下几行。为此创建解析器应该相当容易。
到目前为止,我有这段代码(解析器 + 分词器):
void Main()
{
var tokenizer = new TokenizerBuilder<PrjToken>()
.Match(Superpower.Parsers.Character.EqualTo('#'), PrjToken.Hash)
.Match(Span.Regex("[^\r\n#:=-]*"), PrjToken.Text)
.Match(Span.WhiteSpace, PrjToken.WhiteSpace)
.Build();
var input = @"#Book 1
Chapter 1
Chapter 2
#Book 2
Chapter 1
Chapter 2
Chapter 3";
var library = MyParsers.Library.Parse(tokenizer.Tokenize(input));
}
public enum PrjToken
{
WhiteSpace,
Hash,
Text
}
public class Book
{
public string Title { get; }
public string[] Chapters { get; }
public Book(string title, string[] chapters)
{
Title = title;
Chapters = chapters;
}
}
public class Library
{
public Book[] Books { get; }
public Library(Book[] books)
{
Books = books;
}
}
public class MyParsers
{
public static readonly TokenListParser<PrjToken, string> Text = from text in Token.EqualTo(PrjToken.Text)
select text.ToStringValue();
public static readonly TokenListParser<PrjToken, Superpower.Model.Token<PrjToken>> Whitespace = from text in Token.EqualTo(PrjToken.WhiteSpace)
select text;
public static readonly TokenListParser<PrjToken, string> Title =
from hash in Token.EqualTo(PrjToken.Hash)
from text in Text
from wh in Whitespace
select text;
public static readonly TokenListParser<PrjToken, Book> Book =
from title in Title
from chapters in Text.ManyDelimitedBy(Whitespace)
select new Book(title, chapters);
public static readonly TokenListParser<PrjToken, Library> Library =
from books in Book.ManyDelimitedBy(Whitespace)
select new Library(books);
}
以上代码已经准备好运行在.NETFiddle上linkhttps://dotnetfiddle.net/3P5dAJ
一切看起来都很好。但是,解析器出了点问题,因为我收到了这个错误:
Syntax error (line 4, column 1): unexpected hash #
, expected text.
我的解析器出了什么问题?
您可以通过将章节解析为单独的列表来解决此问题,其中每个章节都以空白字符结尾:
public static readonly TokenListParser<PrjToken, string> Chapter =
from chapterName in Text
from wh in Whitespace
select chapterName;
public static readonly TokenListParser<PrjToken, Book> Book =
from title in Title
from chapters in Chapter.Many()
select new Book(title, chapters);
本质上,我认为当 Text.ManyDelimitedBy(Whitespace)
在 Chapter 2
末尾遇到尾随空格(换行符)时,它会期待另一个 Chapter Name 实例,而不是新书的开头。
解析器无法区分 Chapters
之间的分隔符和 Books
之间的分隔符(均为空格(换行符)),因此它会期待另一章,而不是新的 Book
.
通过将章节的解析器分解为 Text
后跟一个 Whitespace
标记,您就打破了这种歧义。
既然你现在吞下了章节末尾的Whitespace
,每本书都没有用Whitespace
分隔,你必须改变Book
解析器的工作方式还有:
public static readonly TokenListParser<PrjToken, Book> Book =
from title in Title
from chapters in Chapter.Many()
select new Book(title, chapters);
除此之外,如果要解析文件末尾没有换行符,还必须将Chapter
末尾的Whitespace
设为可选:
public static readonly TokenListParser<PrjToken, string> Chapter =
from chapterName in Text
from wh in Whitespace.Optional()
select chapterName;
最后我们得到(完整的解析器):
public class MyParsers
{
public static readonly TokenListParser<PrjToken, string> Text = from text in Token.EqualTo(PrjToken.Text)
select text.ToStringValue();
public static readonly TokenListParser<PrjToken, Superpower.Model.Token<PrjToken>> Whitespace = from text in Token.EqualTo(PrjToken.WhiteSpace)
select text;
public static readonly TokenListParser<PrjToken, string> Title =
from hash in Token.EqualTo(PrjToken.Hash)
from text in Text
from wh in Whitespace
select text;
public static readonly TokenListParser<PrjToken, string> Chapter =
from chapterName in Text
from wh in Whitespace.Optional()
select chapterName;
public static readonly TokenListParser<PrjToken, Book> Book =
from title in Title
from chapters in Chapter.Many()
select new Book(title, chapters);
public static readonly TokenListParser<PrjToken, Library> Library =
from books in Book.Many()
select new Library(books);
}
我想解析以如下格式表示的图书馆图书:
#Book title 1
Chapter 1
Chapter 2
#Book title 2
Chapter 1
Chapter 2
Chapter 3
如您所见,引导的标题前面有一个#,每本书的章节都是以下几行。为此创建解析器应该相当容易。
到目前为止,我有这段代码(解析器 + 分词器):
void Main()
{
var tokenizer = new TokenizerBuilder<PrjToken>()
.Match(Superpower.Parsers.Character.EqualTo('#'), PrjToken.Hash)
.Match(Span.Regex("[^\r\n#:=-]*"), PrjToken.Text)
.Match(Span.WhiteSpace, PrjToken.WhiteSpace)
.Build();
var input = @"#Book 1
Chapter 1
Chapter 2
#Book 2
Chapter 1
Chapter 2
Chapter 3";
var library = MyParsers.Library.Parse(tokenizer.Tokenize(input));
}
public enum PrjToken
{
WhiteSpace,
Hash,
Text
}
public class Book
{
public string Title { get; }
public string[] Chapters { get; }
public Book(string title, string[] chapters)
{
Title = title;
Chapters = chapters;
}
}
public class Library
{
public Book[] Books { get; }
public Library(Book[] books)
{
Books = books;
}
}
public class MyParsers
{
public static readonly TokenListParser<PrjToken, string> Text = from text in Token.EqualTo(PrjToken.Text)
select text.ToStringValue();
public static readonly TokenListParser<PrjToken, Superpower.Model.Token<PrjToken>> Whitespace = from text in Token.EqualTo(PrjToken.WhiteSpace)
select text;
public static readonly TokenListParser<PrjToken, string> Title =
from hash in Token.EqualTo(PrjToken.Hash)
from text in Text
from wh in Whitespace
select text;
public static readonly TokenListParser<PrjToken, Book> Book =
from title in Title
from chapters in Text.ManyDelimitedBy(Whitespace)
select new Book(title, chapters);
public static readonly TokenListParser<PrjToken, Library> Library =
from books in Book.ManyDelimitedBy(Whitespace)
select new Library(books);
}
以上代码已经准备好运行在.NETFiddle上linkhttps://dotnetfiddle.net/3P5dAJ
一切看起来都很好。但是,解析器出了点问题,因为我收到了这个错误:
Syntax error (line 4, column 1): unexpected hash
#
, expected text.
我的解析器出了什么问题?
您可以通过将章节解析为单独的列表来解决此问题,其中每个章节都以空白字符结尾:
public static readonly TokenListParser<PrjToken, string> Chapter =
from chapterName in Text
from wh in Whitespace
select chapterName;
public static readonly TokenListParser<PrjToken, Book> Book =
from title in Title
from chapters in Chapter.Many()
select new Book(title, chapters);
本质上,我认为当 Text.ManyDelimitedBy(Whitespace)
在 Chapter 2
末尾遇到尾随空格(换行符)时,它会期待另一个 Chapter Name 实例,而不是新书的开头。
解析器无法区分 Chapters
之间的分隔符和 Books
之间的分隔符(均为空格(换行符)),因此它会期待另一章,而不是新的 Book
.
通过将章节的解析器分解为 Text
后跟一个 Whitespace
标记,您就打破了这种歧义。
既然你现在吞下了章节末尾的Whitespace
,每本书都没有用Whitespace
分隔,你必须改变Book
解析器的工作方式还有:
public static readonly TokenListParser<PrjToken, Book> Book =
from title in Title
from chapters in Chapter.Many()
select new Book(title, chapters);
除此之外,如果要解析文件末尾没有换行符,还必须将Chapter
末尾的Whitespace
设为可选:
public static readonly TokenListParser<PrjToken, string> Chapter =
from chapterName in Text
from wh in Whitespace.Optional()
select chapterName;
最后我们得到(完整的解析器):
public class MyParsers
{
public static readonly TokenListParser<PrjToken, string> Text = from text in Token.EqualTo(PrjToken.Text)
select text.ToStringValue();
public static readonly TokenListParser<PrjToken, Superpower.Model.Token<PrjToken>> Whitespace = from text in Token.EqualTo(PrjToken.WhiteSpace)
select text;
public static readonly TokenListParser<PrjToken, string> Title =
from hash in Token.EqualTo(PrjToken.Hash)
from text in Text
from wh in Whitespace
select text;
public static readonly TokenListParser<PrjToken, string> Chapter =
from chapterName in Text
from wh in Whitespace.Optional()
select chapterName;
public static readonly TokenListParser<PrjToken, Book> Book =
from title in Title
from chapters in Chapter.Many()
select new Book(title, chapters);
public static readonly TokenListParser<PrjToken, Library> Library =
from books in Book.Many()
select new Library(books);
}