在 Marklogic 服务器上的文本文档中搜索,并希望基于搜索 Patten 的结果

Seach in Text Document on Marklogic server and want result based on searching Patten

我已经在 Marklogic 服务器中上传了一个名为集合的文本文档("calling-returning")。 以下是文本文档:

    [INFO] [workflowContentListner-1] 2019-01-03 00:00:59,884 com.innodata.bsi.interceptors.MethodLoggingAspect logTimeMethod - Thread Id-25 : -7703835814759006134 - Returning from WorkflowContentDao.deleteCompletedOrFailedContentList(..) Execution time: 16 ms
    [INFO] [workflowContentListner-1] 2019-01-03 00:00:59,900 com.innodata.bsi.interceptors.MethodLoggingAspect logTimeMethod - Thread Id-25 : -2561765194895194936 - Calling WorkflowContentDao.getWaitingForContentListToProcess(..) with parameters FTP
    [INFO] [workflowContentListner-1] 2019-01-03 00:00:59,900 com.innodata.bsi.interceptors.MethodLoggingAspect logTimeMethod - Thread Id-25 : -2561765194895194936 - Returning from WorkflowContentDao.getWaitingForContentListToProcess(..) Execution time: 0 ms
    [INFO] [workflowContentListner-1] 2019-01-03 00:00:59,915 com.innodata.bsi.interceptors.MethodLoggingAspect logTimeMethod - Thread Id-25 : -2041334620910360341 - Calling WorkflowContentDao.getFTPWaitProcessType(..) with parameters ftp://10.103.100.43:21/VARIANTGENERATION/INPUT/30357186.pdf
    [INFO] [workflowContentListner-1] 2019-01-03 00:00:59,915 com.innodata.bsi.interceptors.MethodLoggingAspect logTimeMethod - Thread Id-25 : -2041334620910360341 - Returning from WorkflowContentDao.getFTPWaitProcessType(..) Execution time: 0 ms
    [INFO] [workflowContentListner-1] 2019-01-03 00:00:59,915 com.innodata.bsi.consumer.WorkflowContentConsumer processWorkflowContent - processWorkflowContent workflow content task: DPC-CENELEC-PUBLISH 01-7915592210 VARIANT_GENERATION
    [INFO] [workflowContentListner-1] 2019-01-03 00:00:59,915 com.innodata.bsi.schedule.task.ProcessWorkflowContent failWorkflowContentTask - Failing workflow content task using scheduler because its exceeded 30 min since created  DPC-CENELEC-PUBLISH 01-7915592210 VARIANT_GENERATION
    [INFO] [workflowContentListner-1] 2019-01-03 00:00:59,931 com.innodata.bsi.interceptors.MethodLoggingAspect logTimeMethod - Thread Id-25 : 8235148762900748472 - Calling WorkflowContentDao.setPickedBy(..) with parameters com.innodata.bsi.domain.WorkflowContentInfo@5f7839bd
    [INFO] [workflowContentListner-1] 2019-01-03 00:00:59,931 com.innodata.bsi.interceptors.MethodLoggingAspect logTimeMethod - Thread Id-25 : 8235148762900748472 - Returning from WorkflowContentDao.setPickedBy(..) Execution time: 0 ms

我正在搜索此文档“2561765194895194936 - 呼叫”,号码可以是任何内容。 所以我在下面写了查询:

 let $search :=cts:search(collection("calling-returning"), cts:word-query(" - 
 Calling"))
 return $search

但它 return 完整文档。我只想要以下类型的结果:

  2561765194895194936 - Calling
  256176519489514568 - Calling
  568651948951566 - Calling

MarkLogic中搜索和检索的单位是文档。如果要单独搜索行,它们需要是单独的文档。一旦你有了匹配的文档,如果你想从中提取匹配的行,你需要将文档标记为行并且 运行 匹配每一行,比如 tokenize($doc,"\n")[cts:contains(text {.}, $query)]

这不会很有效,您最好预处理文本文档以添加一些标记(即根元素和每行周围的行元素),那么至少您没有对整个事情进行标记化,尽管你仍然需要在事后匹配每一行来完成整个事情:$doc//line[cts:contains(., $query)]