C# 尝试使用正则表达式从 html 中分离名称

Question

<a href="||blablabla link||" title="||blablabla title of torrent|| torrent">||THE STRING THAT IM INTERESTED IN--NAMES||</a>

我正在处理一个包含 20-30 行上述格式行的 html 文件！我有兴趣将所有名称保存在数组列表中。我的问题是我不能完全理解正则表达式格式来获取每个名称我应该使用什么模式？我如何使用此模式来捕获此 html 字符串中的每个名称？谢谢！

Answer 1

string html = @"<a href=""/torrent/4353486/Terminator+Genisys+2015+720p+WEBRip+%5BChattChitto+RG%5D.‌html"" title=""view Terminator Genisys 2015 720p WEBRip [ChattChitto RG] torrent"">Terminator Genisys 2015 720p WEBRip [ChattChitto RG]</a>";
string patten = @"<a\s+href=""[^""]*""\s+title=""[^""]*torrent"".*?>([^<]*)</a>";
foreach (Match m in Regex.Matches(html, patten, RegexOptions.IgnoreCase))
{
    Console.WriteLine(m.Groups[1].Value);
}

这是一个例子，我猜你的dom的标题必须以torrent

结尾

C# 尝试使用正则表达式从 html 中分离名称

C# trying to isolate name from html using regex

html

c#

regex

title