如何在 C# 中声明一个正则表达式来跳过一个字符,将一个字符替换为另一个字符并在特定位置添加一个新字符?

How do you state a regular expression in C# to skip a character, replace one with another and add a new character at a specific position?

我有一个 C# 程序将字幕文本文件作为输入,内容如下:

1
00: 00: 07.966 -> 00: 00: 11.166
How's the sea?
- This is great. 

2
00: 00: 12.967 -> 00: 00: 15.766
It's really pretty.

我想要做的基本上是更正它,这样它就会跳过任何空格,用 , 字符替换 . 字符并在 -> 中添加另一个连字符字符串,这样它将变成 -->。对于前面的例子,正确的输出是:

1
00:00:07,966 --> 00:00:11,166
How's the sea?
- This is great. 

2
00:00:12,967 --> 00:00:15,766
It's really pretty.

到目前为止,我考虑过遍历每一行并检查它是否以数字开头和结尾,如下所示:

if (line.StartsWith("[0-9]") && line.EndsWith("[0-9]")) {
}

不过,我不知道如何声明正则表达式来执行此操作。 请注意,我的输入可以在字幕时间线的任何地方有空格,而不仅仅是在 : 字符之后,所以字符串最终可能会变得更糟:

"^ 0 0 : 0 0 : 0 7 . 9 6 6 -> 0 0 : 0 0 : 1 1 . 1 6 6 $"

可以用正则表达式解决:

(?m)(?:\G(?!\A)|^(?=\d.*\d\r?$))(\d{2}:)[ \t](?:(\d+,\d+[ \t])(-)(>[ \t]))?

替换为 </code>。</p> <p>参见 <a href="http://regexstorm.net/tester?p=%28%3F%3A%5CG%28%3F!%5CA%29%7C%5E%28%3F%3D%5Cd.*%5Cd%5Cr%3F%24%29%29%28%5Cd%7B2%7D%3A%29%5B%20%5Ct%5D%28%3F%3A%28%5Cd%2B%2C%5Cd%2B%5B%20%5Ct%5D%29%28-%29%28%3E%5B%20%5Ct%5D%29%29%3F&i=1%0D%0A00%3A%2000%3A%2007%2C966%20-%3E%2000%3A%2000%3A%2011%2C166%0D%0AHow%27s%20the%20sea%3F%0D%0A-%20This%20is%20great.%20%0D%0A%0D%0A2%0D%0A00%3A%2000%3A%2012%2C967%20-%3E%2000%3A%2000%3A%2015%2C766%0D%0AIt%27s%20really%20pretty.&r=%241%242%243%243%244&o=m" rel="nofollow noreferrer">regex proof</a>。</p> <p><strong>解释</strong></p> <pre><code>-------------------------------------------------------------------------------- (?m) set flags for this block (with ^ and $ matching start and end of line) (case- sensitive) (with . not matching \n) (matching whitespace and # normally) -------------------------------------------------------------------------------- (?: group, but do not capture: -------------------------------------------------------------------------------- \G where the last m//g left off -------------------------------------------------------------------------------- (?! look ahead to see if there is not: -------------------------------------------------------------------------------- \A the beginning of the string -------------------------------------------------------------------------------- ) end of look-ahead -------------------------------------------------------------------------------- | OR -------------------------------------------------------------------------------- ^ the beginning of a "line" -------------------------------------------------------------------------------- (?= look ahead to see if there is: -------------------------------------------------------------------------------- \d digits (0-9) -------------------------------------------------------------------------------- .* any character except \n (0 or more times (matching the most amount possible)) -------------------------------------------------------------------------------- \d digits (0-9) -------------------------------------------------------------------------------- \r? '\r' (carriage return) (optional (matching the most amount possible)) -------------------------------------------------------------------------------- $ before an optional \n, and the end of a "line" -------------------------------------------------------------------------------- ) end of look-ahead -------------------------------------------------------------------------------- ) end of grouping -------------------------------------------------------------------------------- ( group and capture to : -------------------------------------------------------------------------------- \d{2} digits (0-9) (2 times) -------------------------------------------------------------------------------- : ':' -------------------------------------------------------------------------------- ) end of -------------------------------------------------------------------------------- [ \t] any character of: ' ', '\t' (tab) -------------------------------------------------------------------------------- (?: group, but do not capture (optional (matching the most amount possible)): -------------------------------------------------------------------------------- ( group and capture to : -------------------------------------------------------------------------------- \d+ digits (0-9) (1 or more times (matching the most amount possible)) -------------------------------------------------------------------------------- , ',' -------------------------------------------------------------------------------- \d+ digits (0-9) (1 or more times (matching the most amount possible)) -------------------------------------------------------------------------------- [ \t] any character of: ' ', '\t' (tab) -------------------------------------------------------------------------------- ) end of -------------------------------------------------------------------------------- ( group and capture to : -------------------------------------------------------------------------------- - '-' -------------------------------------------------------------------------------- ) end of -------------------------------------------------------------------------------- ( group and capture to : -------------------------------------------------------------------------------- > '>' -------------------------------------------------------------------------------- [ \t] any character of: ' ', '\t' (tab) -------------------------------------------------------------------------------- ) end of -------------------------------------------------------------------------------- )? end of grouping

C# code:

using System;
using System.Text.RegularExpressions;

public class Example
{
    public static void Main()
    {
        string pattern = @"(?:\G(?!\A)|^(?=\d.*\r?\d$))(\d{2}:)[ \t](?:(\d+,\d+[ \t])(-)(>[ \t]))?";
        string substitution = @"";
        string input = @"1
00: 00: 07,966 -> 00: 00: 11,166
How's the sea?
- This is great. 

2
00: 00: 12,967 -> 00: 00: 15,766
It's really pretty.";
        RegexOptions options = RegexOptions.Multiline;
        
        Regex regex = new Regex(pattern, options);
        string result = regex.Replace(input, substitution);
        Console.Write(result);
    }
}

结果:

1
00:00:07,966 --> 00:00:11,166
How's the sea?
- This is great. 

2
00:00:12,967 --> 00:00:15,766
It's really pretty.

可能不是一个单一的正则表达式就可以完成所有事情,但我认为这实际上是一个优势,逻辑易于遵循和修改。

using var input = new StreamReader(inputPath);
using var output = new StreamWriter(outputPath);

// matches a timestamp line with a "->" and no alpha characters
var timestampRegex = new Regex(@"[^A-Za-z]*-\s*>[^A-Za-z]*");

string line;
while((line = input.ReadLine()) != null)
{
    // if a timestamp line is found then it is modified
    if (timestampRegex.IsMatch(line))
    {
        line = Regex.Replace(line, @"\s", ""); // remove all whitespace
        line = line.Replace("->", " --> "); // update arrow style
    }

    output.WriteLine(line);
}