Superpower:匹配除分词器以外的任何非白色字符
Superpower: Match any not white character except for tokenizer
我想使用 Nuget
包 Superpower
来匹配所有非白色字符,除非它是标记化的值。例如,
var s = "some random text{variable}";
应该导致:
["some", "random", "text", "variable"]
但我现在拥有的是:
["some", "random", "text{variable}"]
它的解析器看起来像:
public static class TextParser
{
public static TextParser<string> EncodedContent =>
from open in Character.EqualTo('{')
from chars in Character.Except('}').Many()
from close in Character.EqualTo('}')
select new string(chars);
public static TextParser<string> HtmlContent =>
from content in Span.NonWhiteSpace
select content.ToString();
}
当然,我将在解析器的另一个变量中返回字符串。但这只是简化了。
希望这些信息足够了。如果不是,我确实有整个回购 Github。 https://github.com/jon49/FlowSharpHtml
也许你可以把它写得更简单,但这是我的第一个想法。希望对您有所帮助:
Regex tokenizerRegex = new Regex(@"\{(.+?)\}");
var s = "some random text{variable}";
string[] splitted = s.Split(' ');
List<string> result = new List<string>();
foreach (string word in splitted)
{
if (tokenizerRegex.IsMatch(word)) //when a tokenized value were recognized
{
int nextIndex = 0;
foreach (Match match in tokenizerRegex.Matches(word)) //loop throug all matches
{
if (nextIndex < match.Index - 1) //if there is a gap between two tokens or at the beginning, add the word
result.Add(word.Substring(nextIndex, match.Index - nextIndex));
result.Add(match.Value);
nextIndex = match.Index + match.Length; //Save the endposition of the token
}
}
else
result.Add(word);//no token found, just add the word.
}
Console.WriteLine("[\"{0}\"]",string.Join("\", \"", result));
例子
文本:some random text{variable}
["some", "random", "text", "{variable}"]
文本:some random text{variable}{next}
["some", "random", "text", "{variable}", "{next}"]
正文:some random text{variable}and{next}
["some", "random", "text", "{variable}","and", "{next}"]
可能有许多不同的方法来解析您的输入,并且取决于您的输入实际有多复杂(正如您所说的您已经简化了它),您可能需要对此进行调整。但是使用 Superpower 的最佳方法是创建小型解析器,然后在它们的基础上构建。请参阅下面我的解析器及其描述(每一个都建立在前一个之上):
/// <summary>
/// Parses any character other than whitespace or brackets.
/// </summary>
public static TextParser<char> NonWhiteSpaceOrBracket =>
from c in Character.Except(c =>
char.IsWhiteSpace(c) || c == '{' || c == '}',
"Anything other than whitespace or brackets"
)
select c;
/// <summary>
/// Parses any piece of valid text, i.e. any text other than whitespace or brackets.
/// </summary>
public static TextParser<string> TextContent =>
from content in NonWhiteSpaceOrBracket.Many()
select new string(content);
/// <summary>
/// Parses an encoded piece of text enclosed in brackets.
/// </summary>
public static TextParser<string> EncodedContent =>
from open in Character.EqualTo('{')
from text in TextContent
from close in Character.EqualTo('}')
select text;
/// <summary>
/// Parse a single content, e.g. "name{variable}" or just "name"
/// </summary>
public static TextParser<string[]> Content =>
from text in TextContent
from encoded in EncodedContent.OptionalOrDefault()
select encoded != null ? new[] { text, encoded } : new[] { text };
/// <summary>
/// Parse multiple contents and flattens the result.
/// </summary>
public static TextParser<string[]> AllContent =>
from content in Content.ManyDelimitedBy(Span.WhiteSpace)
select content.SelectMany(x => x.Select(y => y)).ToArray();
然后到运行吧:
string input = "some random text{variable}";
var result = AllContent.Parse(input);
输出:
["some", "random", "text", "variable"]
这里的想法是构建一个解析器来解析一个内容,然后利用 Superpower 的内置解析器 ManyDelimitedBy
在真实内容之间的空白处模拟一个 "split"正在寻找解析出来。这导致 "content" 件数组。
您可能还想利用 Superpower 的令牌功能在解析失败时生成更好的错误消息。这是一种略有不同的方法,但请查看 this blog post 以了解有关如何使用分词器的更多信息,但如果您不需要更友好的错误消息,它是完全可选的。
我想使用 Nuget
包 Superpower
来匹配所有非白色字符,除非它是标记化的值。例如,
var s = "some random text{variable}";
应该导致:
["some", "random", "text", "variable"]
但我现在拥有的是:
["some", "random", "text{variable}"]
它的解析器看起来像:
public static class TextParser
{
public static TextParser<string> EncodedContent =>
from open in Character.EqualTo('{')
from chars in Character.Except('}').Many()
from close in Character.EqualTo('}')
select new string(chars);
public static TextParser<string> HtmlContent =>
from content in Span.NonWhiteSpace
select content.ToString();
}
当然,我将在解析器的另一个变量中返回字符串。但这只是简化了。
希望这些信息足够了。如果不是,我确实有整个回购 Github。 https://github.com/jon49/FlowSharpHtml
也许你可以把它写得更简单,但这是我的第一个想法。希望对您有所帮助:
Regex tokenizerRegex = new Regex(@"\{(.+?)\}");
var s = "some random text{variable}";
string[] splitted = s.Split(' ');
List<string> result = new List<string>();
foreach (string word in splitted)
{
if (tokenizerRegex.IsMatch(word)) //when a tokenized value were recognized
{
int nextIndex = 0;
foreach (Match match in tokenizerRegex.Matches(word)) //loop throug all matches
{
if (nextIndex < match.Index - 1) //if there is a gap between two tokens or at the beginning, add the word
result.Add(word.Substring(nextIndex, match.Index - nextIndex));
result.Add(match.Value);
nextIndex = match.Index + match.Length; //Save the endposition of the token
}
}
else
result.Add(word);//no token found, just add the word.
}
Console.WriteLine("[\"{0}\"]",string.Join("\", \"", result));
例子
文本:some random text{variable}
["some", "random", "text", "{variable}"]
文本:some random text{variable}{next}
["some", "random", "text", "{variable}", "{next}"]
正文:some random text{variable}and{next}
["some", "random", "text", "{variable}","and", "{next}"]
可能有许多不同的方法来解析您的输入,并且取决于您的输入实际有多复杂(正如您所说的您已经简化了它),您可能需要对此进行调整。但是使用 Superpower 的最佳方法是创建小型解析器,然后在它们的基础上构建。请参阅下面我的解析器及其描述(每一个都建立在前一个之上):
/// <summary>
/// Parses any character other than whitespace or brackets.
/// </summary>
public static TextParser<char> NonWhiteSpaceOrBracket =>
from c in Character.Except(c =>
char.IsWhiteSpace(c) || c == '{' || c == '}',
"Anything other than whitespace or brackets"
)
select c;
/// <summary>
/// Parses any piece of valid text, i.e. any text other than whitespace or brackets.
/// </summary>
public static TextParser<string> TextContent =>
from content in NonWhiteSpaceOrBracket.Many()
select new string(content);
/// <summary>
/// Parses an encoded piece of text enclosed in brackets.
/// </summary>
public static TextParser<string> EncodedContent =>
from open in Character.EqualTo('{')
from text in TextContent
from close in Character.EqualTo('}')
select text;
/// <summary>
/// Parse a single content, e.g. "name{variable}" or just "name"
/// </summary>
public static TextParser<string[]> Content =>
from text in TextContent
from encoded in EncodedContent.OptionalOrDefault()
select encoded != null ? new[] { text, encoded } : new[] { text };
/// <summary>
/// Parse multiple contents and flattens the result.
/// </summary>
public static TextParser<string[]> AllContent =>
from content in Content.ManyDelimitedBy(Span.WhiteSpace)
select content.SelectMany(x => x.Select(y => y)).ToArray();
然后到运行吧:
string input = "some random text{variable}";
var result = AllContent.Parse(input);
输出:
["some", "random", "text", "variable"]
这里的想法是构建一个解析器来解析一个内容,然后利用 Superpower 的内置解析器 ManyDelimitedBy
在真实内容之间的空白处模拟一个 "split"正在寻找解析出来。这导致 "content" 件数组。
您可能还想利用 Superpower 的令牌功能在解析失败时生成更好的错误消息。这是一种略有不同的方法,但请查看 this blog post 以了解有关如何使用分词器的更多信息,但如果您不需要更友好的错误消息,它是完全可选的。