解析字符串中的标签
Parsing tags in string
我正在尝试用这样的自定义标签解析字符串
[color value=0x000000]This house is [wave][color value=0xFF0000]haunted[/color][/wave].
I've heard about ghosts [shake]screaming[/shake] here after midnight.[/color]
我想出了要使用的正则表达式
/\[color value=(.*?)\](.*?)\[\/color\]/gs
/\[wave\](.*?)\[\/wave\]/gs
/\[shake\](.*?)\[\/shake\]/gs
但问题是 - 我需要在结果字符串中获取这些组的正确范围(startIndex、endIndex),以便我可以正确应用它们。这就是我感到完全迷失的地方,因为每次我替换标签时,索引总是有可能搞砸。对于嵌套标签来说尤其困难。
所以输入是一个字符串
[color value=0x000000]This house is [wave][color value=0xFF0000]haunted[/color][/wave].
I've heard about ghosts [shake]screaming[/shake] here after midnight.[/color]
在输出中我想得到类似
的东西
Apply color 0x000000 from 0 to 75
Apply wave from 14 to 20
Apply color 0xFF0000 from 14 to 20
Apply shake from 46 to 51
注意索引与结果字符串匹配。
如何解析它?
遗憾的是,我不熟悉 ActionScript,但此 C# 代码显示了一种使用正则表达式的解决方案。我没有匹配特定的标签,而是使用了可以匹配任何标签的正则表达式。而不是尝试制作一个匹配整个开始和结束标记的正则表达式,包括中间的文本(我认为这对于嵌套标记是不可能的),我让正则表达式只匹配一个开始 OR 结束标签,然后做了一些额外的处理来匹配开始和结束标签,并将它们从保留基本信息的字符串中删除。
using System;
using System.Collections.Generic;
using System.Text.RegularExpressions;
class Program
{
static void Main(string[] args)
{
string data = "[color value=0x000000]This house is [wave][color value=0xFF0000]haunted[/color][/wave]. " +
"I've heard about ghosts [shake]screaming[/shake] here after midnight.[/color]";
ParsedData result = ParseData(data);
foreach (TagInfo t in result.tags)
{
if (string.IsNullOrEmpty(t.attributeName))
{
Console.WriteLine("Apply {0} from {1} to {2}", t.name, t.start, t.start + t.length - 1);
}
else
{
Console.WriteLine("Apply {0} {1}={2} from {3} to {4}", t.name, t.attributeName, t.attributeValue, t.start, t.start + t.length - 1);
}
Console.WriteLine(result.data);
Console.WriteLine("{0}{1}\n", new string(' ', t.start), new string('-', t.length));
}
}
static ParsedData ParseData(string data)
{
List<TagInfo> tagList = new List<TagInfo>();
Regex reTag = new Regex(@"\[(\w+)(\s+(\w+)\s*=\s*([^\]]+))?\]|\[(\/\w+)\]");
Match m = reTag.Match(data);
// Phase 1 - Collect all the start and end tags, noting their position in the original data string
while (m.Success)
{
if (m.Groups[1].Success) // Matched a start tag
{
tagList.Add(new TagInfo()
{
name = m.Groups[1].Value,
attributeName = m.Groups[3].Value,
attributeValue = m.Groups[4].Value,
tagLength = m.Groups[0].Length,
start = m.Groups[0].Index
});
}
else if (m.Groups[5].Success)
{
tagList.Add(new TagInfo()
{
name = m.Groups[5].Value,
tagLength = m.Groups[0].Length,
start = m.Groups[0].Index
});
}
m = m.NextMatch();
}
// Phase 2 - match end tags to start tags
List<TagInfo> unmatched = new List<TagInfo>();
foreach (TagInfo t in tagList)
{
if (t.name.StartsWith("/"))
{
for (int i = unmatched.Count - 1; i >= 0; i--)
{
if (unmatched[i].name == t.name.Substring(1))
{
t.otherEnd = unmatched[i];
unmatched[i].otherEnd = t;
unmatched.Remove(unmatched[i]);
break;
}
}
}
else
{
unmatched.Add(t);
}
}
int subtractLength = 0;
// Phase 3 - Remove tags from the string, updating start positions and calculating length in the process
foreach (TagInfo t in tagList.ToArray())
{
t.start -= subtractLength;
// If this is an end tag, calculate the length for the corresponding start tag,
// and remove the end tag from the tag list.
if (t.otherEnd.start < t.start)
{
t.otherEnd.length = t.start - t.otherEnd.start;
tagList.Remove(t);
}
// Keep track of how many characters in tags have been removed from the string so far
subtractLength += t.tagLength;
}
return new ParsedData()
{
data = reTag.Replace(data, string.Empty),
tags = tagList.ToArray()
};
}
class TagInfo
{
public int start;
public int length;
public int tagLength;
public string name;
public string attributeName;
public string attributeValue;
public TagInfo otherEnd;
}
class ParsedData
{
public string data;
public TagInfo[] tags;
}
}
输出为:
Apply color value=0x000000 from 0 to 76
This house is haunted. I've heard about ghosts screaming here after midnight.
-----------------------------------------------------------------------------
Apply wave from 14 to 20
This house is haunted. I've heard about ghosts screaming here after midnight.
-------
Apply color value=0xFF0000 from 14 to 20
This house is haunted. I've heard about ghosts screaming here after midnight.
-------
Apply shake from 47 to 55
This house is haunted. I've heard about ghosts screaming here after midnight.
---------
让我向您展示一种解析方法,您不仅可以将其应用于上述案例,还可以将其应用于所有具有模式切入案例的案例。此方法不限于术语-颜色、波浪、抖动。
private List<Tuple<string, string>> getVals(string input)
{
List<Tuple<string, string>> finals = new List<Tuple<string,string>>();
// first parser
var mts = Regex.Matches(input, @"\[[^\u005D]+\]");
foreach (var mt in mts)
{
// has no value=
if (!Regex.IsMatch(mt.ToString(), @"(?i)value[\n\r\t\s]*="))
{
// not closing tag
if (!Regex.IsMatch(mt.ToString(), @"^\[[\n\r\t\s]*\/"))
{
try
{
finals.Add(new Tuple<string, string>(Regex.Replace(mt.ToString(), @"^\[|\]$", "").Trim(), ""));
}
catch (Exception es)
{
Console.WriteLine(es.ToString());
}
}
}
// has value=
else
{
try
{
var spls = Regex.Split(mt.ToString(), @"(?i)value[\n\r\t\s]*=");
finals.Add(new Tuple<string, string>(Regex.Replace(spls[0].ToString(), @"^\[", "").Trim(), Regex.Replace(spls[1].ToString(), @"^\]$", "").Trim()));
}
catch (Exception es)
{
Console.WriteLine(es.ToString());
}
}
}
return finals;
}
我也有使用单个正则表达式解析 JSON 的经验。如果您想知道它是什么,请访问我的博客 www.mysplitter.com。
我正在尝试用这样的自定义标签解析字符串
[color value=0x000000]This house is [wave][color value=0xFF0000]haunted[/color][/wave].
I've heard about ghosts [shake]screaming[/shake] here after midnight.[/color]
我想出了要使用的正则表达式
/\[color value=(.*?)\](.*?)\[\/color\]/gs
/\[wave\](.*?)\[\/wave\]/gs
/\[shake\](.*?)\[\/shake\]/gs
但问题是 - 我需要在结果字符串中获取这些组的正确范围(startIndex、endIndex),以便我可以正确应用它们。这就是我感到完全迷失的地方,因为每次我替换标签时,索引总是有可能搞砸。对于嵌套标签来说尤其困难。
所以输入是一个字符串
[color value=0x000000]This house is [wave][color value=0xFF0000]haunted[/color][/wave].
I've heard about ghosts [shake]screaming[/shake] here after midnight.[/color]
在输出中我想得到类似
的东西Apply color 0x000000 from 0 to 75
Apply wave from 14 to 20
Apply color 0xFF0000 from 14 to 20
Apply shake from 46 to 51
注意索引与结果字符串匹配。
如何解析它?
遗憾的是,我不熟悉 ActionScript,但此 C# 代码显示了一种使用正则表达式的解决方案。我没有匹配特定的标签,而是使用了可以匹配任何标签的正则表达式。而不是尝试制作一个匹配整个开始和结束标记的正则表达式,包括中间的文本(我认为这对于嵌套标记是不可能的),我让正则表达式只匹配一个开始 OR 结束标签,然后做了一些额外的处理来匹配开始和结束标签,并将它们从保留基本信息的字符串中删除。
using System;
using System.Collections.Generic;
using System.Text.RegularExpressions;
class Program
{
static void Main(string[] args)
{
string data = "[color value=0x000000]This house is [wave][color value=0xFF0000]haunted[/color][/wave]. " +
"I've heard about ghosts [shake]screaming[/shake] here after midnight.[/color]";
ParsedData result = ParseData(data);
foreach (TagInfo t in result.tags)
{
if (string.IsNullOrEmpty(t.attributeName))
{
Console.WriteLine("Apply {0} from {1} to {2}", t.name, t.start, t.start + t.length - 1);
}
else
{
Console.WriteLine("Apply {0} {1}={2} from {3} to {4}", t.name, t.attributeName, t.attributeValue, t.start, t.start + t.length - 1);
}
Console.WriteLine(result.data);
Console.WriteLine("{0}{1}\n", new string(' ', t.start), new string('-', t.length));
}
}
static ParsedData ParseData(string data)
{
List<TagInfo> tagList = new List<TagInfo>();
Regex reTag = new Regex(@"\[(\w+)(\s+(\w+)\s*=\s*([^\]]+))?\]|\[(\/\w+)\]");
Match m = reTag.Match(data);
// Phase 1 - Collect all the start and end tags, noting their position in the original data string
while (m.Success)
{
if (m.Groups[1].Success) // Matched a start tag
{
tagList.Add(new TagInfo()
{
name = m.Groups[1].Value,
attributeName = m.Groups[3].Value,
attributeValue = m.Groups[4].Value,
tagLength = m.Groups[0].Length,
start = m.Groups[0].Index
});
}
else if (m.Groups[5].Success)
{
tagList.Add(new TagInfo()
{
name = m.Groups[5].Value,
tagLength = m.Groups[0].Length,
start = m.Groups[0].Index
});
}
m = m.NextMatch();
}
// Phase 2 - match end tags to start tags
List<TagInfo> unmatched = new List<TagInfo>();
foreach (TagInfo t in tagList)
{
if (t.name.StartsWith("/"))
{
for (int i = unmatched.Count - 1; i >= 0; i--)
{
if (unmatched[i].name == t.name.Substring(1))
{
t.otherEnd = unmatched[i];
unmatched[i].otherEnd = t;
unmatched.Remove(unmatched[i]);
break;
}
}
}
else
{
unmatched.Add(t);
}
}
int subtractLength = 0;
// Phase 3 - Remove tags from the string, updating start positions and calculating length in the process
foreach (TagInfo t in tagList.ToArray())
{
t.start -= subtractLength;
// If this is an end tag, calculate the length for the corresponding start tag,
// and remove the end tag from the tag list.
if (t.otherEnd.start < t.start)
{
t.otherEnd.length = t.start - t.otherEnd.start;
tagList.Remove(t);
}
// Keep track of how many characters in tags have been removed from the string so far
subtractLength += t.tagLength;
}
return new ParsedData()
{
data = reTag.Replace(data, string.Empty),
tags = tagList.ToArray()
};
}
class TagInfo
{
public int start;
public int length;
public int tagLength;
public string name;
public string attributeName;
public string attributeValue;
public TagInfo otherEnd;
}
class ParsedData
{
public string data;
public TagInfo[] tags;
}
}
输出为:
Apply color value=0x000000 from 0 to 76
This house is haunted. I've heard about ghosts screaming here after midnight.
-----------------------------------------------------------------------------
Apply wave from 14 to 20
This house is haunted. I've heard about ghosts screaming here after midnight.
-------
Apply color value=0xFF0000 from 14 to 20
This house is haunted. I've heard about ghosts screaming here after midnight.
-------
Apply shake from 47 to 55
This house is haunted. I've heard about ghosts screaming here after midnight.
---------
让我向您展示一种解析方法,您不仅可以将其应用于上述案例,还可以将其应用于所有具有模式切入案例的案例。此方法不限于术语-颜色、波浪、抖动。
private List<Tuple<string, string>> getVals(string input)
{
List<Tuple<string, string>> finals = new List<Tuple<string,string>>();
// first parser
var mts = Regex.Matches(input, @"\[[^\u005D]+\]");
foreach (var mt in mts)
{
// has no value=
if (!Regex.IsMatch(mt.ToString(), @"(?i)value[\n\r\t\s]*="))
{
// not closing tag
if (!Regex.IsMatch(mt.ToString(), @"^\[[\n\r\t\s]*\/"))
{
try
{
finals.Add(new Tuple<string, string>(Regex.Replace(mt.ToString(), @"^\[|\]$", "").Trim(), ""));
}
catch (Exception es)
{
Console.WriteLine(es.ToString());
}
}
}
// has value=
else
{
try
{
var spls = Regex.Split(mt.ToString(), @"(?i)value[\n\r\t\s]*=");
finals.Add(new Tuple<string, string>(Regex.Replace(spls[0].ToString(), @"^\[", "").Trim(), Regex.Replace(spls[1].ToString(), @"^\]$", "").Trim()));
}
catch (Exception es)
{
Console.WriteLine(es.ToString());
}
}
}
return finals;
}
我也有使用单个正则表达式解析 JSON 的经验。如果您想知道它是什么,请访问我的博客 www.mysplitter.com。