正则表达式 C# 是否可以在替换中使用变量？

Question

我在文本中得到了一堆字符串，看起来像这样：

h1. this is the Header
h3. this one the header too
h111. and this

我得到了函数，它假设处理这个文本取决于它被称为迭代的内容

public void ProcessHeadersInText(string inputText, int atLevel = 1)

因此如果被调用，输出应该如下所示

ProcessHeadersInText(inputText, 2)

输出应该是：

<h3>this is the Header<h3>
<h5>this one the header too<h5>
<h9 and this <h9>

（最后一个看起来像这样，因为如果 h 字母后的值大于 9 它应该在输出中是 9）

所以，我开始考虑使用正则表达式。

示例如下https://regex101.com/r/spb3Af/1/

（如您所见，我想出了这样的正则表达式 (^(h([\d]+)\.+?)(.+?)$) 并尝试对其使用替换 <h></h>）

这几乎是我要找的东西，但我需要在标题级别的工作中添加一些逻辑。

是否可以在替换中添加任何带有变量的作品？

或者我需要寻找其他方式？（首先提取所有标题，考虑函数变量和 header 的值替换 em，并且仅在使用我编写的正则表达式之后？）

Answer 1

您可以像下面使用的那样使用正则表达式来解决您的问题。

Regex.Replace(s, @"^(h\d+)\.(.*)$", @"<><>", RegexOptions.Multiline)

让我向你解释一下我在做什么

// This will capture the header number which is followed 
// by a '.' but ignore the . in the capture
(h\d+)\. 

// This will capture the remaining of the string till the end
// of the line (see the multi-line regex option being used)
(.*)$

括号会把它捕获到变量中，可以作为第一次捕获的"$1"和第二次捕获的"$2"

Answer 2

试试这个：

private static string ProcessHeadersInText(string inputText, int atLevel = 1)
{
    // Group 1 = value after 'h'
    // Group 2 = Content of header without leading whitespace
    string pattern = @"^h(\d+)\.\s*(.*?)\r?$";
    return Regex.Replace(inputText, pattern, match => EvaluateHeaderMatch(match, atLevel), RegexOptions.Multiline);
}

private static string EvaluateHeaderMatch(Match m, int atLevel)
{
    int hVal = int.Parse(m.Groups[1].Value) + atLevel;
    if (hVal > 9) { hVal = 9; }
    return $"<h{hVal}>{m.Groups[2].Value}</h{hVal}>";
}

然后打电话

ProcessHeadersInText(input, 2);

这使用带有自定义计算器函数的 Regex.Replace(string, string, MatchEvaluator, RegexOptions) 重载。

您当然可以将此解决方案简化为具有内联 lambda 表达式的单个函数：

public static string ProcessHeadersInText(string inputText, int atLevel = 1)
{
    string pattern = @"^h(\d+)\.\s*(.*?)\r?$";
    return Regex.Replace(inputText, pattern,
        match =>
        {
            int hVal = int.Parse(match.Groups[1].Value) + atLevel;
            if (hVal > 9) { hVal = 9; }
            return $"<h{hVal}>{match.Groups[2].Value}</h{hVal}>";
        },
        RegexOptions.Multiline);
}

Answer 3

您可以使用的正则表达式是

^h(\d+)\.+\s*(.+)

如果您需要确保匹配不跨行，您可以将\s替换为[^\S\r\n]。见 regex demo.

在 C# 中替换时，将第 1 组值解析为 int 并在 Regex.Replace 方法内的匹配求值器中递增该值。

下面是对您有帮助的示例代码：

using System;
using System.Linq;
using System.Text.RegularExpressions;
using System.IO;
public class Test
{
    // Demo: https://regex101.com/r/M9iGUO/2
    public static readonly Regex reg = new Regex(@"^h(\d+)\.+\s*(.+)", RegexOptions.Compiled | RegexOptions.Multiline); 

    public static void Main()
    {
        var inputText = "h1. Topic 1\r\nblah blah blah, because of bla bla bla\r\nh2. PartA\r\nblah blah blah\r\nh3. Part a\r\nblah blah blah\r\nh2. Part B\r\nblah blah blah\r\nh1. Topic 2\r\nand its cuz blah blah\r\nFIN";
        var res = ProcessHeadersInText(inputText, 2);
        Console.WriteLine(res);
    }
    public static string ProcessHeadersInText(string inputText, int atLevel = 1) 
    {
        return reg.Replace(inputText, m =>
            string.Format("<h{0}>{1}</h{0}>", (int.Parse(m.Groups[1].Value) > 9 ?
                9 : int.Parse(m.Groups[1].Value) + atLevel), m.Groups[2].Value.Trim()));
    }
}

见C# online demo

请注意，我在 m.Groups[2].Value 上使用 .Trim()，因为 . 匹配 \r。您可以使用 TrimEnd('\r') 来删除这个字符。

Answer 4

此线程中有很多好的解决方案，但我认为您并不真的需要 Regex 解决方案来解决您的问题。为了好玩和挑战，这里有一个非正则表达式的解决方案：

Try it online!

using System;
using System.Linq;

public class Program
{
    public static void Main()
    {
        string extractTitle(string x) => x.Substring(x.IndexOf(". ") + 2);
        string extractNumber(string x) => x.Remove(x.IndexOf(". ")).Substring(1);
        string build(string n, string t) => $"<h{n}>{t}</h{n}>";

        var inputs = new [] {
            "h1. this is the Header",
            "h3. this one the header too",
            "h111. and this" };

        foreach (var line in inputs.Select(x => build(extractNumber(x), extractTitle(x))))
        {
            Console.WriteLine(line);
        }
    }
}

我使用 C#7 嵌套函数和 C#6 内插字符串。如果你愿意，我可以使用更多旧版 C#。代码应该易于阅读，如果需要我可以添加注释。

C#5 版本

using System;
using System.Linq;

public class Program
{
    static string extractTitle(string x)
    {
        return x.Substring(x.IndexOf(". ") + 2);
    }

    static string extractNumber(string x)
    {
        return x.Remove(x.IndexOf(". ")).Substring(1);
    }

    static string build(string n, string t)
    {
        return string.Format("<h{0}>{1}</h{0}>", n, t);
    }

    public static void Main()
    {
        var inputs = new []{
            "h1. this is the Header",
            "h3. this one the header too",
            "h111. and this"
        };

        foreach (var line in inputs.Select(x => build(extractNumber(x), extractTitle(x))))
        {
            Console.WriteLine(line);
        }
    }
}

正则表达式 C# 是否可以在替换中使用变量？

Regex C# is it possible to use a variable in substitution?

c#

regex

substitution