在关键字后获得 2 个换行符

Question

我正在开发一个代码，可以扫描多个 .docx 文件中的关键字，然后给出整个句子，直到换行。

这个函数很好用，我得到了每一个包含关键字的句子，直到有一个换行符。

我的问题：

当我不想要第一个换行符之前的文本，而是第二个换行符之前的文本时，我的 RegEx 必须是什么样子？也许使用正确的量词？我没有让它工作。

我的模式：".*" + "keyword" + ".*"

Main.cs

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Text.RegularExpressions;
using Xceed.Words.NET;

public class Class1
{

  static void Main(string[] args)
  {
     String searchParam = @".*" + "thiskeyword" + ".*";
     List<String> docs = new List<String>();
     docs.Add(@"C:\Users\itsmemario\Desktop\project\test.docx");

     for (int i = 0; i < docs.Count; i++)
     {
         Suche s1 = new Suche(docs[i], searchParam);
         s1.SearchEngine(docs[i], searchParam);
     }
  }
}

Suche.cs

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Text.RegularExpressions;
using Xceed.Words.NET;


public class Suche
{
    String path;
    String stringToSearchFor;
    List<String> searchResult = new List<String>();

    public Suche(String path, String stringToSearchFor)
    {
        this.path = path;
        this.stringToSearchFor = stringToSearchFor;
    }

    public void SearchEngine(String path, String stringToSearchFor)
    {
        using (var document = DocX.Load(path))
        {
           searchResult = document.FindUniqueByPattern(stringToSearchFor, RegexOptions.IgnoreCase);

            if (searchResult.Count != 0)
            {
                WriteList(searchResult);
            }
            else
            {
                Console.WriteLine("Text does not contain keyword!");
            }
        }
    }

    public void WriteList(List<String> list)
    {
        for (int i = 0; i < list.Count; i++)
        {
            Console.WriteLine(list[i]);
            Console.WriteLine("\n");
        }
    }
}

预期输出如下：

"*LINEBREAK* Theres nothing nicer than a working filter for keywords. *LINEBREAK*"

Answer 1

您不能使用 document.FindUniqueByPattern DocX 方法跨行匹配，因为它只能在单个段落内搜索。参见this source code，即foreach( Paragraph p in Paragraphs ).

您可能会得到document.Text 属性，或者将所有段落文本合并为一个并在全文中搜索。删除 searchResult = document.FindUniqueByPattern(stringToSearchFor, RegexOptions.IgnoreCase); 行并使用

var docString = string.Join("\n", document.Paragraphs.Select(p => p.text));
// var docString = string.Join("\n", document.Paragraphs.SelectMany(p => p.MagicText.Select(x => x.text)));
searchResult = Regex.Matches(docString, $@".*{Regex.Escape(stringToSearchFor)}.*\n.*", RegexOptions.IgnoreCase)
    .Cast<Match>()
    .Select(x => x.Value)
    .ToList();

在关键字后获得 2 个换行符

Get 2 line-breaks after a keyword

c#

regex

ms-word

docx

xceed