解析字符串中的标签

Question

我正在尝试用这样的自定义标签解析字符串

[color value=0x000000]This house is [wave][color value=0xFF0000]haunted[/color][/wave]. 
I've heard about ghosts [shake]screaming[/shake] here after midnight.[/color]

我想出了要使用的正则表达式

/\[color value=(.*?)\](.*?)\[\/color\]/gs
/\[wave\](.*?)\[\/wave\]/gs
/\[shake\](.*?)\[\/shake\]/gs

但问题是 - 我需要在结果字符串中获取这些组的正确范围（startIndex、endIndex），以便我可以正确应用它们。这就是我感到完全迷失的地方，因为每次我替换标签时，索引总是有可能搞砸。对于嵌套标签来说尤其困难。

所以输入是一个字符串

[color value=0x000000]This house is [wave][color value=0xFF0000]haunted[/color][/wave]. 
I've heard about ghosts [shake]screaming[/shake] here after midnight.[/color]

在输出中我想得到类似

的东西

Apply color 0x000000 from 0 to 75
Apply wave from 14 to 20
Apply color 0xFF0000 from 14 to 20
Apply shake from 46 to 51

注意索引与结果字符串匹配。

如何解析它？

Answer 1

遗憾的是，我不熟悉 ActionScript，但此 C# 代码显示了一种使用正则表达式的解决方案。我没有匹配特定的标签，而是使用了可以匹配任何标签的正则表达式。而不是尝试制作一个匹配整个开始和结束标记的正则表达式，包括中间的文本（我认为这对于嵌套标记是不可能的），我让正则表达式只匹配一个开始 OR 结束标签，然后做了一些额外的处理来匹配开始和结束标签，并将它们从保留基本信息的字符串中删除。

using System;
using System.Collections.Generic;
using System.Text.RegularExpressions;

class Program
{
   static void Main(string[] args)
   {
      string data = "[color value=0x000000]This house is [wave][color value=0xFF0000]haunted[/color][/wave]. " +
                    "I've heard about ghosts [shake]screaming[/shake] here after midnight.[/color]";

      ParsedData result = ParseData(data);
      foreach (TagInfo t in result.tags)
      {
         if (string.IsNullOrEmpty(t.attributeName))
         {
            Console.WriteLine("Apply {0} from {1} to {2}", t.name, t.start, t.start + t.length - 1);
         }
         else
         {
            Console.WriteLine("Apply {0} {1}={2} from {3} to {4}", t.name, t.attributeName, t.attributeValue, t.start, t.start + t.length - 1);
         }
         Console.WriteLine(result.data);
         Console.WriteLine("{0}{1}\n", new string(' ', t.start), new string('-', t.length));
      }
   }

   static ParsedData ParseData(string data)
   {
      List<TagInfo> tagList = new List<TagInfo>();
      Regex reTag = new Regex(@"\[(\w+)(\s+(\w+)\s*=\s*([^\]]+))?\]|\[(\/\w+)\]");
      Match m = reTag.Match(data);

      // Phase 1 - Collect all the start and end tags, noting their position in the original data string
      while (m.Success)
      {
         if (m.Groups[1].Success) // Matched a start tag
         {
            tagList.Add(new TagInfo()
            {
               name = m.Groups[1].Value,
               attributeName = m.Groups[3].Value,
               attributeValue = m.Groups[4].Value,
               tagLength = m.Groups[0].Length,
               start = m.Groups[0].Index
            });
         }
         else if (m.Groups[5].Success)
         {
            tagList.Add(new TagInfo()
            {
               name = m.Groups[5].Value,
               tagLength = m.Groups[0].Length,
               start = m.Groups[0].Index
            });
         }
         m = m.NextMatch();
      }

      // Phase 2 - match end tags to start tags
      List<TagInfo> unmatched = new List<TagInfo>();
      foreach (TagInfo t in tagList)
      {
         if (t.name.StartsWith("/"))
         {
            for (int i = unmatched.Count - 1; i >= 0; i--)
            {
               if (unmatched[i].name == t.name.Substring(1))
               {
                  t.otherEnd = unmatched[i];
                  unmatched[i].otherEnd = t;
                  unmatched.Remove(unmatched[i]);
                  break;
               }
            }
         }
         else
         {
            unmatched.Add(t);
         }
      }

      int subtractLength = 0;
      // Phase 3 - Remove tags from the string, updating start positions and calculating length in the process
      foreach (TagInfo t in tagList.ToArray())
      {
         t.start -= subtractLength;
         // If this is an end tag, calculate the length for the corresponding start tag,
         // and remove the end tag from the tag list.
         if (t.otherEnd.start < t.start)
         {
            t.otherEnd.length = t.start - t.otherEnd.start;
            tagList.Remove(t);
         }
         // Keep track of how many characters in tags have been removed from the string so far
         subtractLength += t.tagLength;
      }

      return new ParsedData()
      {
         data = reTag.Replace(data, string.Empty),
         tags = tagList.ToArray()
      };
   }

   class TagInfo
   {
      public int start;
      public int length;
      public int tagLength;
      public string name;
      public string attributeName;
      public string attributeValue;
      public TagInfo otherEnd;
   }

   class ParsedData
   {
      public string data;
      public TagInfo[] tags;
   }
}

输出为：

Apply color value=0x000000 from 0 to 76
This house is haunted. I've heard about ghosts screaming here after midnight.
-----------------------------------------------------------------------------

Apply wave from 14 to 20
This house is haunted. I've heard about ghosts screaming here after midnight.
              -------

Apply color value=0xFF0000 from 14 to 20
This house is haunted. I've heard about ghosts screaming here after midnight.
              -------

Apply shake from 47 to 55
This house is haunted. I've heard about ghosts screaming here after midnight.
                                               ---------

Answer 2

让我向您展示一种解析方法，您不仅可以将其应用于上述案例，还可以将其应用于所有具有模式切入案例的案例。此方法不限于术语-颜色、波浪、抖动。

    private List<Tuple<string, string>> getVals(string input)
    {
        List<Tuple<string, string>> finals = new List<Tuple<string,string>>();

        // first parser
        var mts = Regex.Matches(input, @"\[[^\u005D]+\]");

        foreach (var mt in mts)
        {
            // has no value=
            if (!Regex.IsMatch(mt.ToString(), @"(?i)value[\n\r\t\s]*="))
            {
                // not closing tag
                if (!Regex.IsMatch(mt.ToString(), @"^\[[\n\r\t\s]*\/"))
                {
                    try
                    {
                        finals.Add(new Tuple<string, string>(Regex.Replace(mt.ToString(), @"^\[|\]$", "").Trim(), ""));
                    }
                    catch (Exception es)
                    {
                        Console.WriteLine(es.ToString());
                    }
                }

            }
            // has value=
            else
            {
                try
                {
                    var spls = Regex.Split(mt.ToString(), @"(?i)value[\n\r\t\s]*=");
                    finals.Add(new Tuple<string, string>(Regex.Replace(spls[0].ToString(), @"^\[", "").Trim(), Regex.Replace(spls[1].ToString(), @"^\]$", "").Trim()));
                }
                catch (Exception es)
                {
                    Console.WriteLine(es.ToString());
                }

            }
        }

        return finals;

    }

我也有使用单个正则表达式解析 JSON 的经验。如果您想知道它是什么，请访问我的博客 www.mysplitter.com。

解析字符串中的标签

Parsing tags in string

regex

string

parsing

analysis